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PROGRAM RESTRUCTURING FOR VIRTUAL MEMORY SYSTEMS 

by 
Jerry Ui I Ma* Johnson 

ABSTRACT 



The problem area addressed in this report* is program restructuring, 
a method of reordering the relocatable sectors (subroutine and data 
modules) of a program in its address space to increase the locality of 
the program's reference behavior, thereby reducing the number of page 
fetches required for its execution in a virtual memory system. 

Theoretical upper and loMer (optimum) bounds are derived for the 
paging performance of programs over all partitions of relocatable sectors 
into pages. 

Program restructuring techniques are developed which use intersector 
reference models based on sector working sets and sector stack distances. 
These intersector reference models identify the local reference behavior, 
and clustering procedures are developed that use this local reference 
behavior to rearrange sectors into pages such that significant 
improvement in paging performance is obtained. 

Results of measurements of paging performance obtained in the 
computer laboratory are discussed. The relationship between the paging 
performance of a program restructured by the practical restructuring 
algorithms and the theoretical bounds on paging performance are compared. 



♦This Technical Report reproduces a thesis of the same title submitted to 
the Department of Electrical Engineering, M.I.T., on June 15, 1974, in 
partial fulfillment of the requirements for the degree of Doctor of 
Phi losophy. 
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CHAPTER 1 



1.1 Introduction 

In this chapter, the problem of restructuring programs to improve 
their paging performance in virtual memory systems is presented. 



1.2 Motivation 

As the use of multiprogramming and virtual memory techniques hae 
become more widespread, the performance of paged virtual memory 
hierarchies has become more important. The fact that paged virtual 
memory systems can be made to perform efficiently at all depends 
primarily on an inherent property of program behavior known as "program 
locality" [D1,D2,D3,D41. Program locality arises from empirical 
observations that actual programs cluster their memory references so 
that, during any interval of time, only a subset of the information 
available is actually referenced. If a program is favoring a subset of 
its information at some particular time, we should like very much to have 
this subset in primary memory. As a result, much of the research efforts 
made to optimize the performance of programs in virtual memory systems 
were spent devising strategies for page management algorithms that could 
maximize the probability of finding in primary memory the information 



needed by the : 4PU at the tine it is referenced, thereby minimizing the 
number of page fetches. Sewer at studies 831,82,021 have shown that this 
probabil ity strongly depends on the rsference patterns of the program 
being executed, that is, on how distributed in the virtual address space 
are the information items successively referenced by the processor. 
Generally, the higher the degree of locality of a program, the higher the 
performance of the virtual memory system with respect to that program. 
However, several comparisons of page replacement algorithms have been 
reported [B1,H1,C1I, often realizing as much as 3§ to 4« percent 
improvement from one algorithm to another for certain programs. In 
particular, an algorithm has been found ffil.fll] that gives the minimum 
number of page fetches for a program. Even though the minimum 
replacement algorithm is practically unrealizable, as it requires a 
knowledge of the future page references of the program every time a page 
fetch occurs, the algorithm Is important because one can use it as a 
theoretical bound against which the performance of any other paging 
algorithm can be compared. 

In all the studies of developing page management algorithms to 
increase the performance of virtual memory systems, the program's page 
reference pattern and hence its locality is considered as a 
non-modifiable input to the system. In contrast to the exploitation of 
the existing locality of programs by paging algorithms, relatively little 
attention has been paid to another important method of obtaining better 
performance from virtual memory systems. This method is to increase the 



degree of locality of the program to be executed. Even less research has 
been focused on developing bounds on the per for nance improvement due to 
opt i mum program locality. 

In this report, ue propose to focus Most of our research efforts 
in the study of program restructuring [C2,H1,D21, a Method of rearranging 
the relocatable sectors (subroutine and data Modules) of a program, to 
increase the locality of the program's reference behavior and thereby 
reduce the number of page fetches required for execution in a virtual 
Memory system. The essential idea behind program restructuring to bring 
about this localization in its reference behavior is to take sectors of 
the program that are used closely together in time and cluster them 
closely together in the virtual address space. 



1.3 The Nature of Program Restructuring 

The nature of program restructuring methods that have been 
proposed so far can be classified along several dimensions. Uith respect 
to the extent of the programmer's involvement, restructuring can be 
manual or automatic , depending on whether rearrangement decisions are 
made by man or computer. Uith respect to the level at which 
restructuring is applied, we can make a distinction between external and 
internal reordering. In external reordering, the sectors which are 
rearranged in virtual memory are relocatable sectors of instructions 



and/or data, internal restructuring consists of reordering parts of 
relocatable sectors with respect to each other or simply deciding where 
to insert page breaks in the code Kl.YU. External restructuring is 
faster and cheaper since it never requires reprogramming. Uith respect 
to the type of "mf creation on wh*eb a i*a»tructur ing procedure is based, 
there are atat ic methods , which only mama use of the knowledge of the 
static properties of the program, and dynamic est hods , which are based on 
data, collected during execution, representing the dynaaic behavior of 
the program. 

Algorithm for automatic restructuring can be applied at 
compilation time if they ere «t#**cx typical e*amples are those methods 
which construct a graph model of the program to be restructured, whose 
sectors are represented toy verttoee (whose weight is the eize of the 
sector) and arcs repreeent the transitions (data or control ref erencee) , 
and then cluster vertices according to connectivity considerations or to 
the cyclic structure of the graph IB3,L1,RI,V2J. We are interested in 
automatic . external program restructuring methods based on the program* a 
dunam i c behavior and in subsequent discussions we will simply call thia 
prograa restructuring. 

In order to provide more insight into the character of prograa 
restructuring which we ui 1 1 study, we make the following general 
assumptions. A program consists of a finite set of relocatable data and 
procedure sectors. These sectors are opaque, since we are concerned with 
the interactions among the sectors and we are not concerned with what 



goes on inside each sector. The average size of a relocatable sector is 
small uith respect to the size of a page (between one-tenth and one-half 
page size). 

Informally, the basic approach to program restructuring is to run 
the program with a set of "typical" input data, record the sector 
reference behavior, formulate an intersector reference model based on the 
recorded information, and then apply a program restructuring procedure 
which uses the model of intersector reference behavior to reorder or 
partition the sectors into logical pages such that the intersector 
references among sectors in different pages is minimized. 

The aim of program restructuring is to increase the locality of 
the program' 8 address reference pattern by reordering the relocatable 
sectors in virtual memory such that sectors that are needed within a 
relatively short time of one another are found in the same logical page 
or in logical pages that would otherwise tend to be in primary memory at 
the same time. The act of restructuring will tend to create a situation 
in which there are either very strong or very weak affinity bonds between 
logical pages. The resultant goal of program restructuring is to 
minimize the page fetches required by a program during its execution in a 
virtual memory system. This is a very difficult goal to achieve because 
the number of page fetches is a function of primary memory allocated to 
the program, the page size, the fetch and replacement policies, the 
sector reference behavior, and the selected ordering of sectors into 



logical pages. 

In order to pose wore fores Hy the nature of tha restructuring 
problea for any prograa sods lad by a sat of relocatable sectors of 
specified size and a Measu red sector -trace deacr ibtng the sector 
reference behavior, we need the fo Honing definitions. 

A prograa is regarded as a directed graph G of a nodes, of size 
Sj > 8, i - 1,...,». The nodes r epre s en t relocatable sectors. Let N be 
the page size, such that 9 < S r < 41 for all i. Let C - leg), 
i,j - 1,...,» be « weighted connectivity aatrix describing the edges of 
G. The edges of G represent the rntereector reference behavior of the 
prograa. Ui th edge (i, j) is associated a cost Cjj > of traversing 
that edge. Mow to cheese the best tntersector reference aodel C froa tha 
Measured sector trace is an tspertant research prob tea. However, Cjj 
eight represent the probability that sector i references sector j, or 
Cjj eight be the total nuaber of tiass sector i aakss a data reference 
or a transfsr of control to ssctor j, or ideal iy cy would repreeent 
the nuaber of page fetches which would occur due to sector i referencing 
sector j in a given virtual aaaory systea unless I and j were grouped 
into the saae page. 

Let n be the nuaber of logical pages of the restructured prograa. 
An n-uay restructuring of G is a eat of noneapty, pairuiss disjoint 
subsets (pages) of G, p t , ...,p n such that 



U^| Pi * G and |pj | < N for all i, where |pj | stands for the 

size of subset p ( , and equals the sun of the sizes of all the sectors 

of Pi. The cost definition for the restructured G is the summation of 

Cy over all i and j such that i and j are in different subsets 

(pages). The cost is thus the sun of all external costs in the partition 

of G. A restructuring of G is optimal if it achieves minimum external 

cost or equi valently maximum internal cost, because the total cost of all 

edges is constant. 

Ue can now point to two distinct and difficult problems 
associated uith program restructuring. One is, given G and C, hou to 
find an optimum restructuring of G, and the other is hou to model the 
inter sector reference behavior C such that an optimum solution to the 
restructuring problem formulated on C gives the minimum number of page 
fetches for a virtual memory system. 



1.4 Importance of Program Restructuring 

The potential of program restructuring for improving the 
performance of programs running in a virtual memory system can be beet 
illustrated by citing some reported results. 



1.4.1 Coseau's Results 

The f irsi pub I i shed resui ts of program res true taring to I ncrease 
the performance of programs in a virtual memory system was in 1967 by L. 
U. Comeau [C2J. Comeau reports that the ordering of relocatable sectors 
of code over virtual pages can have a profound effect on paging 
performance. l*v particular, he found that the number of page fetches 
during an assembly could be decreased by a factor of five by changing the 
ordering of the Monitor Modules at load ties. Four order ings of the 
Monitor Modules Here coMpared under the same primary memory constraints 
and the same paging aigori the*. The stphabetieai ordering produced 8580 
page fetches, ttw randoe order gave 4290 fetches, and order based on 
knowledge of the* page size and functions of the Modules resulted in 2480 
fetches, and an ordering based on the knoMtedge of the functions of the 
Modules, page size and a detai led history of intermodule activity 
generated uhi I e the program uas in execution produced 1280 fetches. 

A subsequent exper i sent by Tsaco, Comeau and ttargo I i n CT1 1 , 
performed on an 180/369 Model 40 in a CP/40 environment, shows that 
paging activity is reduced Much More by a good load sequence of operating 
system subrout ines than by replacement at gor i thes. 



1.4.2 Results of Hatfield and Gerald 



In 1971 Hatfield and Gerald [HI] reported that improvements in 
paging performance, on the IBM/368 Model 67, in the range of two-to-one 
to ten-to-one can occur by using experimental techniques, based on 
information compiled from sector reference traces, to restructure the 
relocatable sectors of compilers, editors, and assemblers. This is a 
significant reduction in the number of page fetches experienced by 
existing, frequently executed programs, and hou close this is to the 
optimum reduction is currently unknown. 

Also, they present an excellent discussion supported by many 
detailed measurements, which shows that the sector reference behavior of 
most programs they examined (especially the system programs: compilers, 
assemblers, editors, etc.) proved to be remarkably insensitive to the 
input data in rather large domains. This is very important because there 
is no merit in tracing a program, massaging the traced data, reloading 
sectors, and measuring changes in paging rates if the improvement only 
holds for the particular set of input data used when it was being traced. 
Fortunately, the relative number of intersector references of many 
commonly used programs is rather insensitive to input data. However, it 
is certainly still true, especially for particular application programs, 
that the uniformity of intersector references over a range of input data 
should be established before sector reordering on the basis of 
intersector behavior is attempted. 
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In addition, they reported that program restructuring to increase 
the local i ty in program reference patterns can have a such More profound 
effect on paying performance in a virtual memory agates) than page 
replacement algorithms. 



1.4.3 Program Design Considerations 

Another technique of increasing the degree of locality of 
programs, but certainly not the easiest to accomplish, consists of 
teaching the pr og r a m mer s hou to design wore local programs [B4,B5,Gl,f11] , 
aaking thee aware of the important language translator considerations, 
providing thee with unambiguous feedback about the paging performance of 
their programs and showing them hou the system penalizes those programs 
which exhibit a poor degree of locality. The typical attitude of virtual 
memory system designers may be expressed by Denning H321 when he states, 
"it is not known whether programmers can be properly educated, inculcated 
with the 'right* rules of thumb, so that they habitually produce programs 
with "good" locality." Unfortunately, the freedom of the programmers 
from the need to worry about physical memory space and its management in 
a virtual memory system is a major obstacle to their education in the art 
of locality. 

Therefore, especially for frequently executed programs such as 
operating systems, assemblers, compilers, editors, production programs. 
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etc., we can see the appeal and the potential rewards of the program 
restructuring approach, that is, to design the program without 
excessively caring about its locality, and then to rearrange its 
relocatable code and data sectors in the virtual address space so as to 
make its reference pattern more local. 
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1.4.4 Related Performance Benefits 

If Me can reduce the number of page fetches required by program 
restructuring, ue ui 1 1 get improved performance in several areas: 

1. Reduced time spent paging. 

2. Less supervisory overhead spent in main 
memory and paging management. 

3. Better throughput on the average, because a 
program ui 1 1 interfere uith others less. 

4. Better paging operation when it is needed, 
because there ui 1 1 be less contention for the 
paging device. 



1.5 Related Research and the Need for Further Research 

The only comprehensive research in the area of automatic program 
restructuring uas reported by Hatfield and Gerald tHll. The essence of 
their work can be interpreted in the follouing context. A program 
consisting of m relocatable sectors occupying n logical pages of virtual 
memory uas run uith a typical set of input data and sufficient 
information uas recorded during the run to produce a complete sector 
trace. A complete sector trace is the time sequence of all sector 
references (instruction and data references) during program execution. 
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A "nearness Matrix" C for Modeling inter sector behavior was 
constructed froM the sector trace. The nearness Matrix is an mxm Matrix, 
uhose entry C^d < i < m, 1 < j < m) is the number of tines sector j 
fol loued sector i in the sector trace or equivalent ly the number of times 
sector i referenced sector j during the execution of the program. This 
matrix is equivalent to a directed graph G of m nodes where the arc from 
node i (corresponding to sector i) to node j has Cy a3 its weight. 

No computationally feasible procedure was found to produce and 
prove an optimum restructuring of G, based on C, into pages, i.e. one 
that minimized the summation of Cy over all i and j such that sector 
i and sector j are grouped into different pages. Instead heuristic 
approaches were used to restructure G. One method used essentially the 
largest values of the eigenvectors of C as a basis for grouping sectors 
together. Another heuristic approach which gave slightly better results 
was a procedure which attempted to cluster sectors into pages, under the 
constraint that the size of each cluster be no greater than the page 
size, such that the square of the interconnecting weighted arc distances 
between pages were Minimized. 

The latter heuristic approach is quite similar to the procedure 
reported by Charney tC31 which partitions a network of interconnecting 
components into groups of components such that the total number of 
interconnecting wires between groups tends to be minimized. 
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As Hatfield and Gerald pointed out, a disadvantage of prograa 
restructuring foreolated on the nearness eatrlx C is thai the nearness 
itatrix contains global inforaation about sector interaction, whereas 
paging depends on local reference patterns. For exaeple, consider two 
sector reference traces S ( and S% . Assuee that sectors i and j are 
referenced exactly k tiees in tooth traces. Let Sj - o 1 (ij) k a k z and S 2 - 
Oj (i ja 2 )* where « ( and ct 2 represent long sector reference 
strings. The value Cy is k in troth cases and C$ is larger in 
Sj . Therefore, the probability that the clustering algorithm Mill 
group i and j together is greater for S| than S 2 . However, the cost 
of not grouping the* together is greater for S 2 , since the nuaber of 
page faults due to the references j i seed lately following thoee to i will 
be at Most I for S t for al I real eeaory sizes greater than one and can 
be k for S 2 for certain « 2 *s. In other words, even an optiaua 
solution of the restructuring pretolea foraulated on the nearness eatrix 
aay not give the ainiaue nuaber of page faults. 

Hatfield and Gerald realized that there are aany cases where the 
nearness Matrix alone does not have all the inforaation needed for 
producing a good sector ordering and that the ordering obtained by the 
restructuring algoritha froa the available inforaation is based on 
heuristics. Accordingly they suppleeented the autoaatic sector 
reordering phase with a hand finishing phase of additional sector 
reordering based on coeplex buean interpretation of the prograa* s uee of 
virtual aeaory over the course of its execution as displayed via an 
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interactive graphics package. Even though the reordering phase based on 
human decisions provided additional improvements in paging performance, 
it can be quite time consuming, and the results are someuhat dependent on 
the imagination and insight possesed by the programmer making the 
decisions. Furthermore, the absence of any knowledge about the maximum 
possible improvement makes it difficult to determine a suitable stopping 
point based on some cost-performance criteria. 

In order to determine if a neu ordering is actually better or 
uorse than an old ordering, they simulated the paging performance of each 
ordering over a range of primary memory sizes and page replacement 
policies. Evaluation of sector order ings by simulation can be an 
expensive process if many sector order ings are compared. 

Based on the current state of research into the problem of 
program restructuring as discussed above, we can identify several areas 
of potentially reuarding research. Ue will assume that a program is 
modeled by a set of relocatable sectors of specified size and a eector 
trace describing the sector reference behavior. 



16 
1.5.1 Intersect or Reference Models 

Ue need a aode I of inter sector reference behavior C, defined over 
the sector trace, that incorporates eore of the local reference behavior 
of the prograe upon which paging actually depends than that captured by 
the nearness eatrix. For exaaple, the probability that a reference froa 
sector i to sector j uill cause a page fault is related to euch local 
inforaation as the tiee elapsed since the last reference to sector j and 
the nueber of distinct sectors referenced sines the last reference to 
sector j in the sector trace. If the ties is short since sector j was 
last referred to and little virtual eeeejry space uas used during that 
time, it is probable that sector j is still in prieary asaory and a new 
reference ui 1 1 not cause a page fetch. If the ties and space travereed 
betHeen references to j are large, it is probable that a page fetch uill 
occur unless j is grouped into the ease page as the referencing sector or 
sows recently referenced sector. Ue propose to foraulate and inveetigate 
two approaches which eeea to have potential for identifying and 
quantifying local sector reference behavior which can be used to weight 
Cjj entries. These approaches ars based on sector working sete and 
sector stack distances defined over the sector trace. 
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1.5.2 Reordering Procedures 

Another area concerns finding better procedures for restructuring 
or grouping the m relocatable sectors of a program into n logical pages 
such that the reordered program achieves or tends to achieve the minimum 
external cost formulated on an intersector reference model C. A strictly 
exhaustive procedure for finding the minimum cost grouping is often out 
of the question. To see this, consider the simple problem of dividing ■ 
sectors into pages containing g sectors each. The total number of 
groupings is as follows: 

Groupings - mj 

(g!) m/ * (m/g)! 

For most values of m and g, this expression yields a very large 
number; for example, i f m - 40 and g - 4, it is greater than 18 . 
Formally, the problem could be solved as an integer linear programming 
problem, with a large number of constraint equations necessary to express 
the uniformity of the partition UU. However, since it seems likely 
that any direct approach to finding an optimal solution will require an 
inordinate amount of computation, the quest for better heuristic methods 
appears to be the best approach. The first and foremost consideration in 
developing heuristics for combinatorial problems of this type is finding 
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a procedure that is powerful and yet sufficiently fast to be practical 
A process uhose running time grous exponentially with the number of 
sectors is not likely to be practical. 
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1.5.3 Sector Ordering Evaluators 

A computationally inexpensive evaluator of sector orderinge is 
needed so that a neu ordering can be estimated as better or worse than an 
old ordering without simulating paging performance for a primary memory 
size and page replacement algorithm. 

One theoretical approach recently reported by Sekino CS41 may be 
applied, given a sector ordering into pages and the probabilities of 
sector i referencing sector j for all i and j, to compute the page fetch 
probability. However, a major drawback of this approach is that after 
the probabilities of going from one system state to another are- computed 
(where a system state is determined by the r pages of an n page program 
in primary memory, the page being referenced, and the state of the 
replacement algorithm), then, even in its simplest formulation, the 
solution of r*(") simultaneous equations are required (a solution 
computationally infeasible for values of n and r usually encountered in 
real programs). 

Another approach relies on the ability to construct a matrix 
model describing the intersector reference behavior from the sector 
trace, given additional knowledge about the size of available primary 
memory and the paging policies, such that the cost of a sector ordering 
(i.e. the cost of the interpage arcs cut) produced by a reordering 
algorithm, is proportional to the number of page fetches expected for 
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that ordering. Wow successful is this approach or any other 
computationally inexpensive approach is an open research question, but 
the existence of this prop lee and the potent iai e*q>ense of any solution 
points out, in part, the teeenee value of the next research topic. 



1.5.4 Performance Bounds 

The tremendously large nueber of sector order ings, and the 
d i f f i cu I ty and expense i nvoJ ved both 4 n choos ing a relet i ve I y good 
order ing and in evaluating a new^ eed»r*i^JMK?be*#er,, or woree than an o I d 
order ing i 1 1 us irate the vi*aJ need to have theoretical bounds on the 
opt i nue i eproveaent in the paging per^reawrw of vi<r two! aenery aye tens 
through prograe restructuring. 



If bounds on the ainteue number of page fetches which could occur 
during execution of a ^program- •■for any reordering- of relocatable sectors 
into logical pages Mere known, they could be used: to determine whether 
or not a given program should be considered for restructuring based on 
its current paging performance; to evaluate the results of a 
restructuring procedure, whether automatic, nanual or both, for a given 
prograe; and to recognize when a good prograe structure ie found. 

Automatic restructuring procedures based on heuristics appear to 
be the only computational ly feasible approach. It is unlikely that any 
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one procedure Hill provide near optimum solutions for all programs. One 
attractive methodology for program restructuring when bounds on the 
optimum performance are knoun is to have a set of automatic restructuring 
procedures available which can be successively applied to a particular 
program until a reasonably good solution is obtained. In the case when 
no reasonably good solution is found automatically, a decision to 
consider manual restructuring and its extent can be made based on the 
potential for additional improvement versus its expected cost. 

The theoretical work reported in the literature to date in 
developing bounds on the paging performance in virtual memory systems 
that can result from program restructuring is nil. Ue will present a 
formal approach to this problem and some preliminary results in the next 
two sections of this report. 



It is our objective to develop upper and lower bounds on the 
number of page fetches which can occur over all reorderings of sectors 
into logical pages of a program, for any program modeled by: a set of 
relocatable sectors of specified size, a sector trace describing the 
intersector behavior, any two-level virtual memory system modeled by its 
page size, primary memory size available to the program, and page 
replacement and fetch policies. 
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1.6 SuMtartt of Goals 



The goals of this thesis aro a* folloust 

1. Foroalize ami analyze tin of foci of the 
structural ordering of a progran's relocatable 
sectors upon its paging perforeance in 
virtual Meaeru sgsteae. 

2. Develop theoretical bounds on the optiMUM 
iepreveeent *» the paging performance of 
prograes in virtual neaorg sgeteee which can 
result free restructuring the relocatable 
sectors of prograes. 

3. Develop theoretical bounds on hou "bad" the 
paging per for nance of prograes can get if the 
"worst" ordering of relocatable sectors is 
chosen. 

4. Forealize neu Models of progran reference 
behavior, such as intersector reference Models 
based on sector working sets and sector stack 
distances, and analyze their effect on reordering 
procedures for isproving ths paging perfornance 
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of programs. 

5. Design and develop practical algorithms for 
restructuring programs to improve their paging 
performance in virtual memory systems. 

G. Perform measurements to compare the relationship 
betueen the improvements in paging performance 
produced by these practical algorithms and the 
optimum improvement specified by the theoretical 
bounds. 
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CHAPTER 2 



FORMALIZATION OF VIRTUAL NEWORV SYSTEMS 



2.1 Introduction 

In this s«ction a formalization of the fundamental 
characteristics of two- 1 eve I virtual memory systems is presented and 
certain performance Measures ar« derived. The purpose of this chapter 
is to develop the terminology and the framework necessary to view this 
research in its proper perspective. 



2.2 Major Parameters of a Tuo-Level Virtual Memory System 

Figure 1 and Table 1 present the major parameters of a two- 1 eve I 
virtual memory system. These parameters can be grouped into three 
categories: (1) Configuration, (2) Automatic Management A Igor i thms, and 
(3) Program Behavior. 
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2.2.1 Configuration 

Virtual memory is assumed in this thesis to be implemented by 
paging on a two- 1 eve I hierarchical physical memory system consisting of 
primary memory. Up, and secondary memory, Ss. (Note that we have chosen 
the notation Ss for secondary memory, i.e., secondary storage, because 
the notation Ms uould. lead to notational conflicts later in this 
report). Each storage device is par-ti tioned into physical blocks called 
pages. A page is the basic unit of information transferred between Up 
and Ss. The page size (usually 4,096 or 2,848 bytes) is denoted by N. 
Each memory device is further characterized by its random access time 
Tj , transfer rate Bj , cost/byte C t , and capacity in pages \n, |. 
Ue assume that Tp < Ts, Bp > Bs, Cp > Cs and |0p| < |Ss|. 



26 



Corrf i aura* 


ion 


1. 


Up i 


2. 


Ss i 


3* 


Wrl - 


4, 


B f i 


5. 


Ci i 


8* 


Ti 


7. 


Mf i 



i s the pr i wary store 

' s the secondary store 

is« the? size* ri*< pagee of- the 1 i-th store 

is the transfer rate of the i-th store 

i s the cost/unit' of the i-th store 

is the average access time of the i-th store 

is the nueber of bytes in a page (page size) 



He»ori| ttajMHjaaatrt Atgeri the; 

1. F is the fetch algorithm 

2. R- i » the replacement a Igor i thm 

Program Behavior 

1. k is the logical address trace 



Tafcle 1 



Major Parameters of Two-Level 
Hierarchical Virtual lienor y Systems 
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A). 





Processor 

H ** 3 1 f 3o t • • • 






(Tp.Bp) 








(Cp. |f1p| ) 








(Ts.Bs.N) 

- 




(Cs, |S 8 |) 



B>. 










Processor 

n B 3i 1 3y f • • • 






(Tv,Bv) 




(Cv,|Mv|) 



C) 



IBM/368-67 

Hp Core 

|Mp| 132 pages 

Cp tl.53/byte 

Tp 375 ns 

Bp 2H1b/s 

Ms D i sk 

|S,| 2848 pages 

C9 t8.84/byte 

Ts 8.G ms 

Bs 1.2Mb/s 

N 4896 bytes 
Tv 885 ns 

Cv t8.18/byte 

|Mv| 2848 pages 



IBM/378-1G5 
Cache 
1GK bytes 
8.88/byte 
1G8 ns 
188f1b/s 
Main Store 
512K bytes 
18.58/byte 
1.44/is 
lGMb/s 
32 bytes 
238 ns 
18.77/byte 
512K bytes 



Figure 1. 



A). Two Level Storage Hierarchy System. B) . Virtual or 
Composite Memory System. C). Representatat ive 
Parameters for Several Virtual Memory Systems. 
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2.2.2 Program Behavior 

The processor, under program control, generates a sequential 
sequence of references to the storage system. The processor references 
are in the form of logical address references or virtual memory 
references which serve to uniquely identify each unit of stored 
information independent of its location in Up or Ss. The time sequence 
of logical address references is cal led an address trace, A and is 
defined as: 

" s ck f 3 f • • • f a * 

Each logical address, a 1 , may be separated into a logical page 
reference and an offset within that logical page. This separation 
process is pictorial ly i I lustrated in Figure 2 where the set of 2**n 
possible addresses are partitioned into 2**n t pages of 2**n 2 - N 
logical addresses each. The time sequence of logical page references is 
called a page trace . P and is defined as: 

P « p' ,p 2 ,...,p L where p 1 » integer (a'/N). 
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n-b i t s 

Address 
a) Logical Address 



n-bi ts- 



Page Displacement 



n, -bi ts -*4* n 2 -bi ts ^ 

(n = ri| + n 2 ) 
b) Logical Address Partitioned into 
Page Address and Displacement 



Figure 2 
Logical Address Structure 
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Information movement between Mp and Ss is accompli shed by 
transferring pages betueen Wp and Ss. He can analyze inter I eve I 
movement for address trace A by considering the corresponding page trace 
P. 

One method of constructing a representation or model of a 
complex activity such as program behavior is to first analyze a 
particular characterization and then gradual ly introduce additional 
detail. In the case of program behavior, it is convenient to begin by 
considering only the address trace and the corresponding page trace. 
Later, we wi II consider the effect ef the program's structure on Its 
beha v i or . 



2.2.3 Automatic Management Algorithm 

Since a processor can service only that portion of a program 
which resides within primary memory, which is relatively small in size, 
the operating system must exercise a special algorithm, called a paging 
algorithm, to keep the "most active" pages of a program in primary 
memory. This is accomplished by transferring pages of the program back 
and forth betueen primary and secondary memories. The goal pf a paging 
algorithm is to maximize the number of times logical information ts in 
the primary memory when being referenced'. 
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The paging algorithm must consist of tno basic policies. The 
Fetch pol icy . F, decides when and uhich information should be moved up 
from Ss to Mp. The Replacement pol icu . R, decides when and which pages 
should be transferred doun from Mp to 5s. 

Def ini t ion3 

1. Q = la,b,...l is a finite set of logical pages 

2. P » p' ,p z ,...p l is a page trace with p' « Q. 

3. Mp £ Q is the contents of Up at time t- 

4. F - f ' , f 2 , .. . f L is a finite time sequence of L sets, 

f' £ u\ 1 < t < L. 

5. R = r 1 ,r 2 ,...,r L - ($) is a finite time sequence 

of L sets, r' c Q,l < t < L. 
B. ri p - (M*- 1 -r' ) U f' ,1 < t < L. 
7. F and R are valid if f * n fl'' 1 » ♦.r'fi 0*"' 
and p + t Up , 
1 < t < L. 

The F and R policies are defined to denote a particular 
realization of a paging algorithm for a given trace P. For a page trace 
and initial primary memory state ft° p , a F-policy and a R-policy 
together determine the time sequence of primary memory states that will 
occur a3 the virtual memory system processes the trace. Ue will 
consider only valid F and R policies. That is, none of the pages 
fetched at time t, f 1 , may be in primary memory at time t-1; the set 
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of pages removed at time t, p* , must be in primary memory at t-1; and 
the page reference at time t, p' , mus* be in primary memory at time t. 



2.3 The Virtual Storage Model 

A tuo-levet hierarchical virtual storage eye tee, V, ie compoeed 
of all the parameters described abovet 

V - f (<conf iguration>,<program behavior>,<algor ithms>) 

V - f (<|Mp|,Tp,Cp.Bp, |S9|,Tb,Cb,Bs,N>,<A>,<F, R>» 

The rationale for two- I eve I hierarchical virtual memory aye teme 
a9 9hown in Figure 1 ie to couple expensive ton capacity fast memories, 
np, with inexpensive large capacity alouer memories, Ss, such that the 
composite or virtual memory system approaches the speed of the expeneive 
memory and the capacity and cost/unit of storage of the inexpeneive 
memory. 



2.4 Performance Measures 

The rationale for a virtual memory system, V, immediately 
suggests three measures of its effective performance. These three 
measures are its effective capacity |Mv|, effective cost/unit, Cv, and 
effective access time, Tv. 
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2.4.1 Effective Capacity 

The effective capacity |flv| - |Ss| is achieved through the 
paging algorithm of the virtual memory system and the constraint that 
all logical pages initially reside in Ss. 



2.4.2 Effective Cost 

The effective cost Cv is defined as follows: 

Cv - C pinp l +CslSg l 
IflpMSsl 

The effective cost Cv is seen to approach the cost Cs under the usual 
condition that the size of secondary memory is much larger than the size 
of primary memory. 



2.4.3 Effective Access Time 

For simplicity in developing techniques for analyzing and 
providing insight into the much more difficult problem of the effective 



34 



access time, Tv, ue ui 1 1 first consider a demand fetch pol icy, F d . 
Later, our consider at ions »i II focus cm other fetch policies. 

Assume that, at time t, the processor generates a logical 
address reference a', which refers to page p. At that point in time, 
the page p way reside in tip or Ss. Under a demand fetch policy Fd, If p 
is in Up, the reference proceeds and 1*0 page movement occurs. 
Otherwise, if p is in Ss, a pane fault or aaae fetch occurs and the page 
is automatically transferred to Up and the reference proceeds. If Mp 
were already full, the removal policy, R, must be employed to remove 
some page in Up to provide space for the nem page request. 

Formally, a demand page fetch pel icy Fd, for a virtual memory 
system V is defined as follows: 

Recall that 

1. P - p' ,p z ,...,p l is life page trace determined from A 
and N. 

2 . F„ - f J, , f ^ f L d i s a val id fetch pel l cy . 

3. R - r 1 ,r 2 ,...,r L is a valid removal policy. 

4. ni- fn': 1 tifi»-r*. 



Def Sni tion of F d 



1. If p' € H'- 1 , then f d - r*- +. 
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2. If p'^ fl'-' and in'-' |<.|Mp.|. 
then fj, - lp' I and r' - +. 

3. if pV ny and inVl-mpl. 

then f J, » lp' I and r' - la) 

where a t M*" 1 and a is selected by 

the removal algorithm. 



Under demand paging, the primary memory Mp simply fills as 
required by I and 2, whi le the f irst |Hpj pages are referenced. 
Subsequently, referenced pages are swapped between Mp and Ss as required 
by 1 and 3 . 

Let FFp, the number of page fetches from Ss during the 
processing of a page trace P, be defined as the Biflfi. f?tch function and 
rts value given by: 

FFp - Z L M |f' |. 

By analogy to the page fetch function, the number of reference* 
satisfied by tip is called the P^ge success faction. SFp, and it can be 
computed as 

SFp - |P|-FFp. 
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The effective access time, Tv; of a virtual memory aysteM V, la 
defined as fol lous: 

Tv - EEfiTs + (l-(EEfi)) Tp 
IP I IP I 

The value of the effective access time Tv, is seen to approach 
the fast access time Tp, of primary memory as the value of the fetch 
frequency function, FFp/|P|, is reduced toward zero or equivalent ly, for 
a given page trace P. as the value of the page fetch function FFp 
approaches zero. Therefore, urn see that the value of FFp is a crucial 
measure of the performance of a- program in a virtual memory system. In 
general, ue wish to minimize the page fetch function in order to 
minimize the effective access time Tv. 



2.4.4 Page Trace Simulation 

One method to determine the value of the page fetch function 
FFp, for a given virtual memory system V is to compute the resultant 
page trace P, from the address trace A and the page size N, then 
simulate the paging algorithms, F and R, and record the contents of Up 
at each step of the page trace. Table 2 illustrates this step-by-step 
simulation, assuming demand paging and LRU (Least Recently Used) 
removal. The contents of Mp are shown ordered to reflect the LRU 
ordering: the top page is the page most recently fetched into flpt the 
bottom page is the page least recently used by the program and is the 



37 



Virtual memory system V - f (<|Mp|, Tp.Cp.Bp, |Ss| ,T8,Cs,Bs,N>, 
<A>, <F,R>) with parameters 



A - a 1 , a 2 



.12 



P - a,b,a,b,c,c,b,a,a,b,b,a, Hhere p 1 - integer (a* /N) . 
However, we have used lower case letters to represent 

logical page addresses instead of page numbers because 

it simplifies the presentation. 

IPI - 12 

Q = label and |Q| - 3 < |Ss| 

l«p| = 2 

F - demand fetch, F d 

R - LRU replacement, R^py 
Simulation: 
Time 1 2 3 4 5 6 7 8 3 18 11 12 



Page Trace, P 


ababecbaa 


b 


b 


a 


Fetch, F 


ab08cB8a8 











Remove R^u 


0000a00c0 











Up contents 
after time t 


ababecbaa 
ababbebb 


b 

a 


b 

a 


a 
b 



RESULTS: 



FFp - 1% 



ltd I " * 



FFp - 4/12 
IPI 



Tv - TS + 2Td 
3 3 

Table 2 

Example of Page Trace Simulation to Determine FFp 
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page selected for removal when necessary. 



2.5 Page Fetch Function Performance Model 

From the above discussion, we observe that several parameters 
of a virtual memory system V»f(<|Wp|,Tp f Cp.Bp, |Sa| ,Ts,Cs,Bs,N>, <A>, 
<F,R>) influence the value of the page fetch function, FFp. These 
parameters are the page size N, the program's storage reference pattern 
A, and the removal policy R, the fetch policy F and the size of primary 
memory |t1p|. Therefore, we define 

FFp - FFp(|Hp|,N,A,F,R). 

The significance of all these parameters on the page fetch 
function measure will be considered and investigated. Special emphasis 
mi I I be focused on analyzing and understanding the relationship between 
the program's structure and the logical address trace. 

Ue will not elaborate in great detail, but it should be pointed 
out that, for hierarchical I y-structured virtual memory systems of more 
than two levels, say K levels, and demand paging (those studied by 
Hadnick (I13J), Me can derive the effective page trace and thus the page 
fetch function for paging to the i-th level from level i-1 (level 1 is 
primary memory). To illustrate this, note that the resultant fetch 
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policy at level i-1, Fj_, = f ■_, , f^, , . . . f-_, , 

is essentially the page trace Pj for level i. There is an easy 

compression of Fj.j to omit the values of fj_| » 4 and a 

minor relabeling required to adjust for the difference in page 3ize used 

by Mi and H M of Pj - fj_, (Nj_, -1/Nj). This 

procedure is applicable for all levels 1 < i < k, and the goal of a 

k-level memory system is to minimize I £,j FFpj * Tp^i . 



2.5.1 Replacement Algorithm Considerations 

Even though Me will be primarily concerned with the effect of a 
program's structure on the value of the page fetch function, FFp, ue 
need to consider some important effects of the page removal algorithm on 
FFp. Many removal algorithms have been proposed and studied in the 
past, such as First-In-First-Out (FIFO), Least Recently Used (LRU), and 
Belady's [Bll Optimum algorithm (0). Ue Mill define these removal 
algorithms under demand fetch to illustrate hoH particular algorithms 
may be specified in our general model of removal policies, and to 
establish exactly Mhat these algorithms mean, since they will be 
referred to frequently in the remainder of the thesis. Furthermore, ue 
have chosen to discuss this particular subset of removal algorithms 
because they Mill enable us to present several important and uell known 
properties of removal algorithms Mhich Mill eventually be needed In our 
research. Let: 
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1. P = p 1 ,p 2 , . . . , p L be a page trace computed from 

A and N. 

2. | np | = number of page frames in primary memory, Mp. 

3. Up' = the set of pages in tip at time t. 

4. Fd = f d ,f * ,. . . , f d be a demand fetch policy as 

previously defined. Recall that the 
definition of Fd specifies all the 
mechanics of paging except the page to be 
selected for replacement. 

The LRU removal policy, R LRU , is defined for demand fetch, Fd, 
as R LRU » r' LRU ,r LRU ,..,r LRU where 
•"LRU " * ' f f d " ♦ or IHp'" 1 ! < IMP I $ otherwise, 
r LRU " a » "here a is the page in Mp which was least recently 
referenced. 

The optimum removal policy, Ro, is defined for demand fetch, Fd, as 

Ro = r[ ,r 2 r L where r[ - * if f d - ♦ or 

| np t_, | < |np|; otherwise, r^ « a, where a is the page 

in Up'" 1 with the longest future time to next reference in the page 

trace, P, from p' . If a c Mp M is never referenced again, then 

its time of next reference is assumed to be • . If a page must be 

removed at time t, and several pages have the same longest future time 

to next reference (i.e., all equal to » ) then remove any one of the 

pages. 
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Under demand fetch, the Fir9t-In-Fir9t-0ut replacement policy, 
RpiPO is defined as 
R FIF0 = r FIFO» r FlFOf».r F |FO "here 
r FiFO = * ' f f d = ♦ or I Up* - *| -< | Mp 1 5 otherwise, 
rpipo - a where a is the page in Mp'" 1 which has been in 
Mp M longer than any other page in Mp*' 1 . 

Ue now present several well known properties of these replacement 
a Igor i thms. 

Lemma 1. 

For a given page trace, P, primary memory size of |Hp| page frames, 
and demand fetch policy, Fd, then the number of page fetches using any 
valid removal policy Ra is greater than or equal to the number of page 
fetches using the optimum replacement policy, Ro. The proof of this 
Lemma can be found using various techniques in [Al.Ml] and is not 
repeated here. 

Inclusion Property: 

Under demand fetch, Fd, any replacement policy is 9aid to satisfy 
the inclusion property if for all page traces, P, 

a. Mp' (1) c lip' (2) c ... c lip' <n), where Mp' (j) is the 

contents of primary memory Mp at time t if the size of Mp is j page 
frames (i.e., |Mp| = j), 1 < j < n. 

b. At any time t after Mp has become filled, there is a strict 



42 



replacement ordering referred to as the "replacement stack," RS, 
RS - rsm.rsi2),...,rsfn), where r»l j) - Mp' (})-flp' lj-1) f°r 
j = 1,2 n, and rstn* is the page to be removed next. 

The general class of demantf-feteh replacement algorithms which 
satisfy the inclusion property are referred to as "stack algorithms" in 
the literature. The class of stack algorithms, as noted by Denning 
tDll, "contains all the reasonable algorithms." 

Lemma 2. 

The number of page fetches required by any stack algorithm for any 
page trace is a mono tonic function of primary memory size, |Mp|, in page 
frames. To see this, note that if there is a fetch at time t for a 
primary memory of a given size, there must also be one at time t for 
every primary memory of smaller size. The proof a,f this Lemma can bm 
found in (D1,N1). 

Lemma 3. 

Demand fetch with LRU removal and demand fetch with Optimum 
replacement are stack algorithms. The proof of this Lemma can be found 

in (Mil. ' 

Ue will refer to the above well -known properties several times in 
the rest of this thesis. At this point in time, we can immediately 
conclude that, for any |ttp| and A, 
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a. FFp(|Mp|,N,A,Fd,Ro) < FFp(|f1p| ,N,A,Fd,Ra) from Lemma 1, when 
Fd, Ro are demand fetch and optimum removal policies and Fd, Ra are 
demand fetch and any removal policies. 

b. FFp(|r1p|,N,A,Fd,R LRU ) < FFp(|rV |.N,A,Fd,R LRU ) and 
FFp(|Mp|,N,A,Fd,Ro) < FFp(|Hp' | ,N,A,Fd,Ro) from Lemmas 2 and 3 
where |Hp| > |Hp' | . 

Due to i ts simplicity, the FIFO replacement algorithm was used in 
many of the early paging systems. In recent times it has been 
discovered that FIFO has certain disturbing pecular i t ies, such as the 
possibility that the number of page fetches will double for a memory 
size increase of one page frame [Al.Hl]. Hence, FIFO is not a stack 
algorithm, and we cannot claim that, for any A and |Mp|, 
FFp(|np|,N,A,Fd,R F , F0 ) < FFp( |Mp' | ,N,A,Fd,R F | F0 ) , where 
|n P |>|Hp'|. 

Thu3, we observe that the inclusion property of stack algorithms is an 
important property. 

Various forms of the LRU replacement algorithm frequently occur In 
contemporary virtual memory systems. Empirically, LRU replacement has 
been found to closely approximate the paging performance obtained by the 
optimum algorithm for many actual programs. The optimum policy is not 
physically realizable since it requires future knowledge about reference 
behavior, but it can be used as a theoretical basis for performance 
comparison with practical algorithms. However, the value of the page 
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fetch function, 

FFp(|np| ,N>A,Fd,Ro) - Z^i Md I »s physically real izable IBB1 

9ince it does not require future knowledge. 

For any page trace P - p 1 ,p 2 ,...,p L and primary memory size 
|Mp|, Belady has given a one-pass procedure which will compute the value 
that | f j | would take on under optimum removal for any 1 < t < L 
without any knowledge of the page trace after t (i.e., 
p M ,p U2 ,...,p l ). In particular, this procedure determines 
whether | f ), | » 1 or | f \ | - 4, but it does not specify of what 
page f|| consists. 



2.5.2 Program Structure Considerations 

In this section, ue will extend the page fetch function performance 
model to account for the program's structure. 

The programs we consider are defined to consist of a set of 
m relocatable sectors of specified sizes. The structure of a program Is 
specified by a particular load ordering sequence of its sectors in Its 
virtual address space. This ordering is called a sector or dering SO. 
and is defined as 
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bu = b | ,i>2 , • • • , 3 m 

where Sj denotes the first, S 2 the second, and S m the last sector 
loaded in the virtual address space. Thu3 a program can have m! 
distinct structures, one for each possible sector ordering, SO. 
However, once a sector ordering is chosen, it does not change during the 
execution of the program. Let |S ( | be the 3ize of the jth sector and 
let L|Sj | be the load address of Sj in the virtual address space of 
the program. If the sectors are loaded contiguously in virtual memory, 
then L|Sj | = E^'i |Sj |. In any event, we assume that the 
structure of a program is completely specified by its sector ordering 
SO, which is further defined to include the 3ize and load addresses of 
all its sectors. Therefore the sector ordering SO of a program 
specifies the load sequence, S|,S 2 ,..., S m , and the values of 
| Sj | and L|Sj | for a I I 1 < j < m. 

Ue have previously modeled the program behavior by its logical 

address trace A = a 1 ,a z a L and have shown that the address 

trace A and the page size N are sufficient to determine the page trace 
P - p 1 ,p 2 ,..., p L . However, the address trace and hence the page 
trace depends on the particular sector ordering chosen for the program. 
For example, if a', the logical address referenced at time t , is 
within sector j, then the value of a' depends on where Sj is in the 
sector ordering SO. 
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In order to study the effect of a program's structure on its paging 
performance, ue will model a program's behavior by its sector trace . 
The sector trace ST of a program is defined to be the time sequence of 
sector references and is given by 

where S* denotes the sector referenced at tine t, 

Given the logical address trace A corresponding to a specific 
sector ordering SO, the sector trace ST can be easily computed from thm 
load addresses of the sectors. Then this sector trace can be used to 
compute the page trace resulting from any program restructuring 
specified by a neu sector ordering if the sectors do not cross page 
boundaries. 

In particular, given a program modeled by its sector trace ST and 
its sector ordering SO, the page referenced at time t, p f , is given by 

p* - integer (LIS 1 | /N) , 

uhere S' is the sector referenced at time t in the sector trace ST, 
L|S* | is the load address of sector S* given by the sector ordering 
SO, and N is the page size. Ue ore assuming at this point that 
individual sectors do not cross page boundaries. 
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As long as this is true, we can define the restructuring of a 
program as a partition of the relocatable sectors into logical pages. 
In particular, let, 

1. Q - 1S | ,S 2 , . . . ,Sml be the set of relocatable 

sectors making up a program. 

2. n = the number of logical pages of size N of the 

restructured program. 

Then an n-way restructuring of P is defined as a partition 
II = in | , n 2 , • . . , Iln) Mhere II has the fol lowing properties: 
a. U "., Ili - Q, Ili n IIj - + for all j * j. 



b. 1 |Sk| < N for all Ili, 1 < i < n. 

s k « n i 



Thus, we see that a partition, II , specifies the set of relocatable 
sectors grouped into each logical page. He will assume that the set of 
sectors in n ] are loaded one after another into logical page 1, then 
the set of sectors in n 2 are loaded one after another into logical 
page 2, etc., until all the sectors are loaded in the logical address 

span of the program. If 2 |Sk|<N, then there will be a hole or 

S k cIIi 
a non-referenced area in the top of page i. 

Therefore, given any partition, II , of the relocatable sectors Into 
logical pages and any sector trace, we can compute the page trace 
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immediately. For example, let S* be the sector referenced at time t 
in the sector trace and let S* til j, then the page, p' , referenced at 

tiie t is j. 

From the above discussion, we observe that — given any two- level 
virtual memory system V, with page size N, uith primary memory size of 
| tipf page frames, with any valid page fetch algorithm Fa, and with any 
valid page removal algorithm Ra — we have the value of the page fetch 
function FFp. This FFp is for a program uhose structure is modeled by 
any partition, Ila, and uhose reference behavior is modeled by a sector 
trace ST. FFp can be uniquely defined In terms of the following 
parameters: 
FFp=.FFp(|l1p|,N, na,ST,Fa,Ra>. 

For a particular virtual memory system, V, the values of |ttp|, N, 
Fa.Ra are fixed, and a given reference behavior fixes the value of ST. 
Under these conditions, the value of FFp becomes a function of the 
different partitions of relocatable sectors into pages. However, as 
pointed out in Chapter 1, the number of different partitions becomes 
astronomical for many typical programs. For example, phase 1 of the AED 
compiler has 10 75 di fferent parti t ions. For such programs it is 
impossible from any practical point-of-view to determine the best 
program structure (the ft that minimizes FFp) for a given reference 
behavior by trying out all partitions. 
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From our discussion in Chapter 1, we know that for a given sector 
trace, a partition II which groups sectors into pages such that the 
number of intersector references betueen pages of the partition is 
minimized may not minimize FFp. In fact, ue presented a qui te plausible 
sector trace where such a II would indeed be a very bad partition. One 
major goal of this thesis is to find some way of computing the mini mum 
value of FFp over all partitions. 

If upper and lower bounds on the value of FFp over all partitions 
can be found, then a particular program structure could be evaluated as 
good or bad. Furthermore, those bounds would provide a means of 
evaluating the ability of practical clustering procedures to produce a 
good program structure. 

The practical drawback of the model developed for the page fetch 
function, FFp, is that sectors are not allowed to cross page boundaries. 
Even though this may not be a serious drawback, we will eventually try 
to extend the model of FFp to take into account the case when sectors 
may cross page boundaries. 
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2.6 Sector Fetch Function Performance Model 

Ue will now define a measure on the information transfer between 
the two levels of a virtual memory system which 1» independent of the 
sector ordering. In the next section, we will employ this measure to 
find theoretical upper and lower bounds on the value of the page fetch 
function over all sector partitions. 

If we assume that the basic unit of information transfer between 
the two levels of a virtual memory system V* is a sector instead of a 
page, ue can formulate a measure on the Interlevel- movement of 
information during the execution of a program which is independent of 
its sector ordering. 

Let FFs, the number of sector fetches which occur in a virtual 
memory 9ystem V during the processing of a sector trace ST, be defined 
as the sector fetch function . The processing of a sector trace in V* ie 
called sectoring and can be interpreted similarly to the processing of a 
page trace or paging in V as previously discussed. 

Since the virtual memory system, V, for sectoring is to be 
modified slightly from the virtual memory system, V, used In our 
discussion of paging, ue need to define the notion of sectoring more 
precisely. 
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The parameters of a demand sectored virtual memory system, V*, are 
def ined as fol lows: 

1. |Hs| is called the size of the primary memory, Us. 

|M9| is the number of sector frames in the primary memory. 
The size of these sector frames, say in bytes, need not be the 
same. Instead ue assume that the size of a sector frame in 
bytes is exactly equal to the size In bytes of the sector it 
contains. Thus, the size in bytes of any sector frame and of 
Ms can vary with time if the sector sizes are different, but 
the important fact is that the number of sector frames in Ms 
i3 fixed and equal to |Ms|. In contrast, ue should point out 
that the size, |Hp|, of the primary memory, Mp, for a paged 
virtual memory system, V, was defined to be the number of page 
frames of fixed size N in the primary memory Mp. 

2. ST = S 1 ,S 2 , ...,S L is a sector trace of a 

program. 

3. Fd = f d , f d ,. . . , f d is the demand sector 

fetch pol icy of V. 

4. R - r ,r ,...,r is the sector removal 

pol icy of V. 

Let Ms denote the set of sectors in primary memory at time t and 
Ills 1 | denote the cardinality of this set. 



52 



Nou, demand sectoring and the value of the sector fetch function, 
FFs, i3 defined as follows: 

a. If S' e tl3 M , then fj, - r f = 4> 

and Us' - Ms'"'. 

b. If S'* f1s M and IMs*" 1 ! < |t1s|, then 

f J, = IS 1 ) , r' = * and 
fls' = Ms'- 1 + IS' ) 

c. If S^ fls'- 1 and |ns M | - |I1b|, then 

f J, = IS* I , r* = (SI and 

Us' = Hs M + IS* I - ISI Mhere 

Sc Us'" 1 , and S is selected in accordance 

uith the removal algorithm. 

d. FFs - I 1 -., |fj|. 

The value of the sector fetch function FFs, for any sector trace, 
ST, can be uniquely determined by simulating algori thm Fd and R for a 
primary memory of size |Hs| at each step of the sector trace. 
Therefore, ue define 
FFs = FFs(|ns|,ST,Fd,R). 

It should be clear that the value of FFs ui I I be the same for any 
sector ordering, since the sector trace is independent of the sector 
ordering. It should also be clear, from the definition of |Hs| and 
parts a. and b. of the definition of demand sectoring, that the value of 
FFs for a given sector trace is independent of the sector sizes. Ue do 
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not need to be concerned with the implementation problems associated 
with the variable sector frame sizes of V, since we will be using the 
sector fetch function only as an analytic tool, and since we can 
determine the value of FFs through simulation without even knowing the 
sector sizes. 

In the next Chapter, the sector fetch function, FFs, will be 
utilized to provide upper and louer bounds for the page fetch function, 
FFp. 



This empty page was substituted for a 
blank page in the original document. 
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CHAPTER 3 



PAGING PERFORMANCE BOUNDS 



3.1 Introduction 

In this chapter, we will investigate the effect of a program's 
structure on its paging performance in a virtual memory system. Ue Mill 
begin by presenting theoretical upper and lower bounds on the value of 
the page fetch function, FFp(|f1p|,N, Ila,ST,F,R), over all partitions. 

Ha, of relocatable sectors into logical pages for fixed values of the 
other parameters. 

Recall that the value of the page fetch function, 
FFp(|Mp|,N, n,ST,F,R), is the number of page fetches a program Mould 
experience in a two-level virtual memory system, V, with primary memory 
size of | Mp | page frames of size N, using the page fetch and removal 
algorithms, F and R, respectively, for a given sector trace, ST, and 
program structure, II. Ue would like to present a uniform method that 
uould bound the value of the page fetch function, FFp, oyer all 
partitions, Ila, of relocatable sectors into logical pages for "any" fixed 
values of the remaining parameters. The merit of 9uch a uniform bounding 
method would be two-fold. First, it would be applicable to any two-level 



virtual memory system, V, that is, any values of |ttp|, N, F, and R. 
Second, it uould be applicable for any program behavior characterized by 
a sector trace. 

In contrast to a uni form approach, a second approach uould be to 
bound the value of FFp over all partitions when certain or all of the 
remaining parameters are constrained. For example, ue could assume that 
|Mp| - 1, F - demand fetch, R - FIFO replacement and ST - any fixed 
sector trace, and then derive bounds on FFp over all 11a. Clearly, the 
disadvantage of the second approach is that it uould have quite limited 
applications. However, one advantage of the second approach is that the 
additional knouledge gained by fixing certain parameters of the virtual 
memory system could permit thm utilization of bounding methods uhich 
uould result in tighter bounds. Ue Mill investigate both approaches in 
this chapter. Ue have the conviction that a uniform approach over all 
virtual memory system parameters and a If sector traces is vital for 
general applicability. However, given a uniform bounding method, it 
uould certainly be worthwhile to investigate the possibility of obtaining 
tighter bounds when feasible constraints on certain parameters of the 
virtual memory system are specified. 

Ue begin by imposing constraints upon the structure of the program, 
that is, on the partitions, II , of relocatable sectors into pages, and 
then gradually remove these constraints. 
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3.2 Loner Bounds 

Let us constrain the structure of a program such that each logical 
page contains at most k sectors. In particular., let: 

1. Program = (S| ,S 2 , . .. ,S m ) be a f ini te set of 

m relocatable sectors such that |Sj | < N for 8 < i < n; that 
is, the sector size in bytes is less than the page size, N, in 
bytes; otherwise, the sector size may vary. 

2. Ila = ( II, , n 2 ..... n„ ) be any partition of 

the m relocatable sectors into n logical pages uhere the number 
of sectors | IIj | in page j satisfies the constraint 
1 < | IIj | < K. 

3. Recall from our definition of II that 

Z |Sj | < N must always be true. 

Thus, we are currently concerned with all the partitions, Ila, which 
restructure a program such that each logical page has k or fewer sectors. 
The sector sizes may vary, but the sum of the sector sizes grouped into a 
page must not exceed the page size. Uith this rather flexible constraint 
on the allowable partitions, we can find a lower bound for the value of 
the page fetch function, FFp, over all such partitions for a given sector 
trace and any virtual memory system. Ue present this lower bound in 
Theorem 1. 
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Theorem 1 

Given any two- 1 eve I virtual memory system V, w4th page size N, 
primary memory size |Hp|, and any val id pag# replacement algorithm Ra, 
any valid page fetch algorithm Fa, and any «wk: tor trace STa, then, for 
any partition ffa, of relocatable sectors into logical pages of the 
program where each page contains at moat k sectors, the minimum number of 
page fetches given by the page fetch function model, FFp, has a loner 
bound given by: 

k*FFp(|rTp|,N, Ha.STa.Fa.Ra) > FFs f |R»1 - |«p|*k,ST - STa,Fd,Ro) 
where the value of the sector fetch function, FFs, is the number of 
sector fetches which occur in a two-level virtual memory system V*, with 
primary memory size |F1si - |ttp| **, the same sector trace STa, demand 
fetch Fd, and optimum replacement Re. 

Corol lary la 

The 9*ize of Up in bytes is equal to the size of Ha in bytes if each 
page is completely filled with exactly k sectors of the same size. 

Proof of Theorem 1 

Notation and properties 

Let STa - x' ,x 2 ,...,x l where x' is the sector referenced at 
time t. For virtual memory system V and FFp let: 
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1. Ila = I II, , n 2 ,..., II n I be any 

partition of sectors into the n logical pages of the progran 
where each page contains at most k sectors. (This 
interpretation of a partition Mill be useful later in this 
thesi s. ) 

2. P = p 1 ,p 2 , . . . ,p L be the resultant page 

trace computed uniquely from ST and II a , such that if 
x' f II j , then p' ■ j. 

3. flp be the set of pages in Up at time t 

and M° p = *. 

4. F e « f a ,f a ,...f a be any fetch policy 

where f [ n fl' p ' - 4 and | f a | = the number of 
pages in f, and x' € I'll'p' U f a ] . 

5. R 8 » r a ,r a ,...r B be any removal policy 

where r s q M'p 1 and x' f r, . 
B. I1 p = (|n'-'u fi|)-r» 

Given the above notation and properties, we will first prove: 
Lemma 4. 

For each Fa and Ra there exists a demand fetch and removal policy, 
Fd and Rd, for the FFs model such that 
k*FFp(|rip|,N, na.STa.Fa.Ra) > FFs(|Ms| - |rip|*k,ST=STa,Fd.Rd) . 

Proof: 

For the FFs model, Fd and Rd will be constructed by forming a 
sequence of valid replacement and fetch policies 
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(F, ,R, ), (F z ,R 2 ),..., <Fh,Rh) Mbere: 

1. F, - f', ,f z , ,...,f L , and f', - U f^ - 

the set of sectors Making up the set of pages in f, , for 
1 < t < L, where | U f^ {-the number of sectors in the set. 

2. Similarly R, - r\ ,r 2 , ...,r L , and 

r', - U r[ , for 1 < t < L. 

3. F h - F d - f' d ,f 2 d ,...,f L d and 

Rh " Rd - r 'd ,r 2 J ,...,r d , for 1 £ t s L where 

f d - r d .*if xU «V i f d - k' and 

r d - f if xV"V and |H*j' | < JMsh 

f d - x* and r d - b* fl'j 1 if x' j «V 

and |flV \ - |tts|; and 

n d - («V U f d l-r d to satisfy demand 

sectoring. 

For reasons of expediency, the proof of Lemma 4 Mill be divided into 
two parts, Lemmas 4a and 4b. 

Lemma 4a: 

If |Ms| > ||Mp||, then for <F, ,R, ) - (Fa.Ra), there exists a 
valid sequence of sector replacement and fetch policies 
(F, ,R, ), <F 2 ,R 2 >,..., (Fh,Rh) such that <Fh,flhJ-tFd,Rd) and 
2m If' I > S M |f d |; where ||Hp|| denotes the maximum number 
of sectors that could ever be found in Up. (Note that, in 
Lemma 4, ||Mp|| - |Mp|*k f ) 
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A proof similar to Lemma 4a has been givey by [Fill for pure paging 
systems. However, ue need the following proof to Make our extensions 
easier to understand. 

Proof of Lemma 4a. 

The procedure for constructing Fj and Rj from their immediate 
predecessors Fj_ t and Rj.! in the FF 8 model for 1 < j < h is: 
STEP 1. 

Choose the smallest t such that fj_| and/or r{_j do 
not satisfy demand sectoring. 
STEP 2. 

Let z' be the sector lx* J referenced at time t in the FFs model. 

CASE 1. 

Nou suppose that fj.j doe3 not satisfy demand sectoring, 
la. 

If t < L and z'e fj., , then set f| - (z'l, and 
f V = f H u (f !-i - ,z '' *• Tnis construction insures that 
f [ contains the sectors already fetched by the 

FF p model but not fetched by the FF 8 model (i.e. deferred sector 
fetches) . 
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lb. 

If t =• L and z* « fj,, , then -set f\ m.- iz*\. 

lc. 

If t < L and z'f fj., , then set f} -« +» and? f 'f 1 - f u h \ U f}_, . Mote that 

this allows the reference x' - z f to proceed because sector z-'c H|_| . z*c 

fl).i , since z*6 Mp' and |ttj| - |WaJ for at* 1 s. j < h, and since 

|ns | > |tnp||. The last fact, |Hs| > | |Hpj | , allowa Oj., to hold 

| |Hp| | sectors; therefore we can always keep a sector in ttj_ t until 

the corresponding page is reeoved fro»s<f%raa shown in CASE 2 below. 

Id. 

If t =» L and z') fj., , then set f\ -■■#» The reference proceeds 

due to the same argument as given in lc. 

In all subcases of CASE 1 note that Fj is vat id since 

f ) ^ M 1 ," 1 for 1 < t < L, that Fj satisfies deaand sectoring at 

least up through tine t, and that Z\\ \f]\ & I^| |f{_il • 

CASE 2 

Now suppose that r|. ( does net satisfy deeand sectoring. 

2a. 

If t < L. and fj - iz'l and Itt^Y | - |tte|, set 

r\ - lb' I for soec b*t r|_, and 

r 'j*' - r h u CrJ. r li). Note that since 

I^Vi I " I" 9 ! and f H " ♦♦then rj., * f and the 
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above operations are always defined. Also, note that r'j* 1 is 

constrained here and in all subcases to contain only the sectors already 

removed in pages by the FF p model but not yet removed by the FF 8 

model; therefore, a sector will not be removed from FF 8 until the 

corresponding page is removed from FF p . This constraint is enforceable 

since the memory size of FFs at each step j, |Mj| - f Ms | , satisfies the 

relation |Plj | > | |Hp| | for 1 < j < h. 

2b. 

If t = L and f\ - Jz'l and IM'jL 1 , | - |J1s|, then 

r'j = lb' I & r l h] . 

2c. 

If t < L, and f) - + or in',; 1 , |<|I1s|, then set rj - + 

and r'* 1 = r^', U r}_, . 

2d. 

If t - L, and f\ =. ♦ or | M^J, | < |Ms|, then r'j - +. 

In all subcases of CASE 2, note that Rj is valid since 
r }s M'j" 1 for 1 < t < L, and that Rj satisfies demand 
sectoring at least up through time t. 

A final comment: if it ever occurs that z*€ rj.j and 
z'e f).| , then simply remove z' from both. This only reduces the 
value of | f ] | , and it takes care of the case when a page is fetched 
into and replaced from Up uithout having all of its sectors referenced. 
The above procedure, after being applied at most h times, must terminate 
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with a valid replacement and fetch poi icy pair (R,, .F,, ) euch that: 

2 L ,., M 1 , I > S l t.. Mil-' 
Hence, Lemma 4a is proved. 

Choosing |tts| - |ttp| «k satisfies Lemma 4a and ue immediately get 
l\.\ Mil >A.\ Mil - FFBl|n»4-|Hpf *k,ST,Fd,Rd). 

Lemma 4b. 
2\.i 'M* I < k*FFp(Jf1p|,N, Ha ,STa,Fa,Ra). 

Proof: 

2«.i Mi I - 2\., luf'f - l\., Mil 'l«»*i» / Mil- 
But (UfJ, | / |fj | < k, since luff, | is the number of sectors in 
fi and Mo I ,s tne number of pages in f i . Hence, 

Tni \f\ I < k* Z L ,., Mi I - k« FFpl|ttpUN,Ila,STa,Fa,Ra). 

Lemma 4b i s proved. 

From Lemmas 4a and 4b, Me immediately get 
k*FFp(|I1p|,N, Ila ,STa,Fa,fla) >FFs(|Ms| - |Hp| «k,STa,Fd,Rd), 
and Lemma 4 is proved. 

From Lemma 1 of Chapter 2 , ue knou that 
FFs(|»3| - |ttp| »k,ST,Fd,Rd) > FFs(|Ma| - |Hp| «k,ST,Fd,Ro). 
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From Lemma 1 and Lemma 4 we immediately get 

k*FF p (|Mp|,N, na,STa f Fa,Ra) >FF 8 (|f1s| - |ttp| *k,STa,Fd,Ro) 

and Theorem 1 is proved. 

Proof of Corollary la. 

The size of rip in bytes is |Hp|*N, and the size of Us in bytes is 
( | tip | «k) frames* (N/k)bytes/frame = |Mp|*N. 

Now, a feu comments about Theorem 1. For any given program behavior 
characterized by a sector trace, Theorem 1 provides a method of computing 
a lower bound on the inprovement in paging performance over all sector 
partitions into logical pages, when pages are constrained to have k or 
fewer sectors. The louer bound given by Theorem 1 is valid for any 
virtual memory system. Another beneficial property of Theorem 1 is that 
the lower bound is specified in terms of a stack algorithm. We know that 
Ro is a stack algorithm from Lemma 3. Furthermore, it is well known 
that, for all stack algorithms, the number of page fetches required to 
process a page trace can be computed for all primary memory sizes fro* 
one simulation run. For a general discussion of the procedure, the 
interested reader 3hould see [Ml], and for a particular discussion of a 
simulation procedure for the optimum replacement algorithm which requires 
only one pass through the page trace, reference is made to [B5] . Ue 
implemented the latter method for the sector fetch function, FFs, and 
from one simulation run through any sector trace we were able to plot 
FFs(|Ms| - |F1p| * k,ST,Fd,Ro)/k as a function of |Mp|. 
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Figure 3 conveys the general shape of this bound. 



FFp 




FFsdHsl - Iflpl * k.ST.Fd.Ro) 
k 



IMpI 



FIGURE 3. 
Lower Bound on FFp Given by Theorem 1 



66 



The utility of such a curve as shown in Figure 3 is as follows. 
Theorem I states that the number of page fetches given by the page fetch 
function FFp( |Mp| ,N, IIa,ST,Fa,Ra) for the same sector trace cannot be 
reduced belou the curve shown in Figure 3 by any reordering of sectors 
into logical pages regardless of the paging algorithms employed. 

Given that we have a procedure for lower bounding the effects of a 
program's structure on its paging performance in any virtual memory 
system, an interesting question is, just how tight is this bound for 
popular virtual memory systems? If Fa is constrained to be demand fetch 
and Ra is constrained to be LRU, FIFO or Optimum replacement, then ue 
could prove, by example, that the lower bound on FFp given by Theorem 1 
can be the greatest lower bound for certain sector traces and only a 
lower bound for others. Ue will show that it can be the greatest lower 
bound in a following example later in this thesis. 

Ue Mill present and discuss empirical results in Chapter 6 which 
illustrate that the bound given by Theorem 1 is indeed rather tight for 
real programs running in a paged virtual memory system using demand fetch 
and LRU replacement. Ue will not discuss particular empirical results In 
this chapter because we want to relate the results to intersector 
reference models, to clustering procedures and to theoretical bounds at 
the same time. Intersector Reference models will be developed in Chapter 
4 and clustering procedures in Chapter 5, and in Chapter 6 we show the 
results of applying these methods to restructure real programs such that 
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the resulting number of page fetches is quite close to the theoretical 
bound developed in this chapter for most memory sizes and popular paging 
algorithms. 

Now consider restricting the fetch and replacement policies of FFp 
to be demand fetch and LRU replacement. Under this restriction, can ue 
replace the optimal sector replacement policy, Ro, of the sector fetch 
function, FFs, by some less efficient policy such as LRU and hence 
produce a tighter lower bound on FFp over all partitions? This line of 
logic led to the following question: is it true that 
k*FFp(|l1p|,N, na,STa,Fd,R lRU ) >FFsC|Hs| -■ |Hp| * k,STa,Fd,R LRu >? 

It seems intuitive that the above conjecture would be true even for 
the case where each logical page contained exactly k sectors. Here, the 
sectored memory could contain exactly the same number of sectors as the 
paged memory could contain. Futhermore, at most k sector fetches would 
be required to bring into Us the same information brought into lip by one 
page fault. One might expect that, for programs having a good structure, 
i.e., all pages contain sectors that are used together, each page fetch 
should produce k sector fetches. Hence, we have divided the value of FFe 
by k in the conjecture. In spite of its intuitive appeal, we can prove 
that the conjecture is not true for all program behavior. In order to 
validate this claim, we present the following Theorem. 
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Therein 2 

For any two-level virtual memory system V, Hith page size N, primary 
memory size |Hp|, demand fetch Fd, and LRU replacement Rlru • then 
there exists a sector trace ST, and a partition II of relocatable sectors 
into logical pages where each page contains k sectors, such that 
k*FFp(|Mp|,N, n ,ST,Fd,R LRU ) <FFs(|ll9| - |Mp| *k,ST,Fd,R LRU ) , 
where the value of the sector fetch function FFs is the number of sector 
fetches which occur in a tuo-level virtual memory V, with primary memory 
size |Hs| = |Hp| * k, using demand fetch Fd, LRU replacement Rlru* 
and the same sector trace ST. 

Proof 

Consider the virtual memory system with the parameters: 

| Up | » 3 pages 

k = 3 or each page of size N contains three sectors. 

iris | - | rip J *k = 9 sector frames 

F » demand or Fd 

R = LRU or R LRU 

Program => labcdefghi jkll , a set of 12 relocatable 

sectors of size N/3. 
ST = (adgjklhiefbc) 2 . 
1ST | = 24 
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Consider £1 -labc,def ,ghi , jki) where A - abc, B - def, etc. Then 
for ST - adg jkthi efbc adgjk Ibiefbc 



P - ABC ODD CCB BAA ABC BOO CCB BAA 
Fd =» ABC 008 €80 BAB 088 068 888 BAB 
R LRU - BBB A88 888 BBS 088 ABB 088 BOB 
M'p - ABC ODD CCB BAA ABC 000 CCB BAA 
AB CCC DOC CBB BAB CCC DOC CBB 
A BBB BBO OCC CCA BBB BBO QCC 



t t 

I _sa»e 1 



FFp - X^, | fj | - 7 page fetches 

Now, we compute the number of sector fetches for the same sector 
trace. 

ST ■ adgjk Ihief bcadg Jklhi efbc 

Fd = adgjk Ihief bcadg jklhi efbc 

R tRU =» 80000 0080a dgjk I hhsfb cadg 

Ms => adgjk Ihief bcadg jklhi efbc 

adgj klhie fbcad gjklh iefb 

adg jklhi efbca dgjkl hief 

ad gjklh i efbc adgjk Hue 

a dgjkl hiefb cadgj klhi 

adgjk Ihief bcadg jklh 
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adgj klhie fbcad gjkl 

adg jklhi efbca dgjk 

ad gjklh iefbc adgj 

J 



same 



FFs - I 2 *, |f^ | = 24 sector faults. 

.*. FFp = 7 < FFs/k - 24/3 = 8 QED. 

It is interesting to observe that, if the above sector trace, 
ST =* (adg jklhiefbc) 2 , consisting of two cycles through the same sector 
reference pattern, were generalized to a sector trace 
ST = (adgiklhiefbc)" , consisting of n cycles, then FFp « 3+2n and 
FFs - 12n. Hence, FFp is approximately a factor of 2 less than (FFs)/k 
for large n. These last tuo values of FFp and FFs are easily verified by 
observing that the paging and sectoring simulations of every cycle after 
the first are respectively the same. 

In our empirical studies of the paging behavior of real programs. Me 
found instances where 

k*FFp(|Mp|,N, n,ST,Fd,R LRU ) <FFs(|f1s| - |Mp| *k,ST,Fd,R LRU ) . 
These instances occurred for memory sizes | tip | in the region of lou 
paging rates under good program structures, i.e., under partitions which 
produced low values for FFp. 
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Ue point out in passing that other similar attempts to bound FFp for 
certain memory constraints faited. For example, 
k*FFp(|Mp|,N, n ,ST,Fd,R f)F0 ) is not lower bounded toy 
FFs(|Ms| - |Mp| *k,ST,Fd,R m > '• 

The interested reader may verify this by going through the 
simulation in the proof of Theorem 2 with RVuro and 
ST - (a def be ghi jkl de) , while keeping everything else the same. 



3.3 Upper Bounds 

Hou large can the value of the page fetch function become by 
choosing the "worst" program structure, that Is, the program structure 
which results from the partition, II, that maxlmizee the value of FFp? 

Theorem 3 

Given any two- 1 eve I virtual memory system V, with page size N, 
primary memory size |Mp|, demand fetch Fd, LRU replacement Rlru • and 
any sector trace STa, then for any partition, Ila, of the relocatable 
sectors into logical pages of the program, the maximum number of page 
fetches given by the page fetch function FFp is upper bounded by 
FFp(|Mp|,N, na,STa,Fd,R Lmj ) <FFs(|Hs| - |Hp|,ST - STa,Fd,R LRU ) , 
where the value of the sector fetch function, FFs, is the number of 
sector fetches uhich occur in a tuo-level virtual memory system V, with 
primary memory size |Hs| - |Hp|, demand fetch Fd, and LRU replacement 



72 



R LRU , using the same sector trace ST - STa. 

Proof: Let: 

ST =» x 1 , x z , . . . , x be any sector trace. 

II =■ I FI | , n 2 II n J be any parti tion of sectors 

into pages. 

P - p 1 , p 2 , . . . ,p L be the resultant page trace 
computed from II, and ST. 

flp = contents of memory of FFp model at time t. 

Ms ■ contents of memory of FFs model at time t. 

Fp - f p .f 2 f L p - F d of FFp. 

Rp = r' p ,r 2 r L p = R LRU of FFp. 

Fs - f' s ,f 2 f L 8 - F d of FFs. 

Ra - r' s ,r 2 s r L s - R LRU of FFs. 

Suppose, at time t in the FFp model, that p' «■ z, the page 
containing the set of sectors II, is referenced. Then, at time t in 
the FFs model, x' * z* is the sector referenced, nhere sector z* «• II, . 

CASE 1. 

Suppose p' e flp'" 1 . Then f J, ■ +. 
If x'e fls'-' , then fg - ♦, and | f ' p | - | f [ | . 
If xV Ma'* 1 . then *l - lb' Iff fls'- 1 , and 
|f P l < |fi I- 
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CASE 2. 

Suppose p' /ftp'"' . Then f' p « iz) , and 
rj,- Ibt c tip'*' under LRU. 

If xV Hs M , then f[ « CzM, r[ - tb* I c Up'-' under 
LRU, and | f p | « \f[\. 

If x' ( f1s M , then f^ = *, and | f J, | > |fj|. This 
condition causes a problem. 

Ue Mill prove that p' f Up'* 1 and x' t fls'" 1 can never occur 
together. 

Assume x' e fla'" 1 . Let t* < t, toe the largest tine, t', such 
that x e « x' , then p' e Up 1 '. Since pV Wp**' , then 
there occurred at least |ttp| distinct page references to Hp in the 
interval (t-l-f,t-l) none of which were p' . Therefore, these were at 
least |ris|-|Mp| distinct sector references to Ms \r\ the Interval 
(t-l-t*,t-l) none of uhtch were x' and x' e fts 1 * 1 but this 
contradicts Rs - R l r u . Thus, x' :* Ha 1 " 1 if pV ftp'" 1 • 

Hence, I^_, | fj | < it, | f \ \ and the The or en 
is proved. 

Corol lary 3a 

FFs(k* |Mp|,ST.Fd,Ro)/k < FFp((l1p|,N f na,STa,Fd,R LWJ ) < FFs(|rtp| ,ST,Fd,R d ) 

Proof: 

Fol I oms immediately from Theorems 1,3. 
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Theorem 3 provides an upper bound on the value of the page fetch 
function, FFp, over alj partitions, Ila, of the relocatable sectors into 
logical pages for virtual memory systems which employ the popular demand 
fetch and LRU removal algorithms. Under what conditions Mill the upper 
bound given by Theorem 3 be the least upper bound or even a tight upper 
bound? 

Let the interval of time between a fetch of any page and the 
subsequent removal of that page be called a page lifetime. Now, consider 
a partition, lie, of sectors into logical pages, such that, during a 
lifetime, of any page, only one of the sectors of that page is 
referenced. However, let this one sector be referenced any number of 
times in a given page lifetime, and let the particular sector uhich is 
referenced vary from lifetime to lifetime. Ue will say that such a 
partition satisfies the page lifetime constraint. 

For any partitions which satisfy the page I i feti me constraint, it is 
obvious that Theorem 3 is the least upper bound. This implies that the 
extent to which partitions exist which group sectors together uhich are 
not used close together in time is the extent to uhich Theorem 3 ui I I 
produce a tight bound. 

Since LRU is also a stack algorithm, the values for the upper bound 
given by Theorem 3 can be computed for all memory sizes by one simulation 
of the sectoring activity for FFs(|f1s| ■ |Hp| ,ST,Fd,R^pu ) . 
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Therefore, by applying Theorems 1 and 3 a graph similar In form to that 
shoun in Figure 4 can be obtained. The gap between the two curves 
represents the range of values of the page fetch function, FFp, over all 
partitions when demand page fetch and LRU page replacement policies are 
employed. For a particular program structure, the value of FFp in 
relation to the upper and loner bounds can be used to evaluate the 
potential of program restructuring. 

In Chapter 6, ue uill present empirical results which show that the 
bounds given by Theorem 3 are quite reasonable for several actual 
programs. This implies that real programs can have sector arrangements 
which result in a lot of page fetches. In fact we found in our studies 
of real programs that the actual value of the page fetch function can 
vary by a factor of tens for two different order ings of sectors into the 

logical pages. All of these results for real programs are given in 
Chapter G. However, ue will now present an example which uill show the 

logistics of applying Theorems 1 and 3. 



3.4 Simple Example of Computing Bounds 

Ue have chosen a very simple, compressed sector trace of a rather 
small program so that (a) ue can illustrate the actual computation of the 
upper and lower bounds and (b) ue can easily obtain the best and worst 
sector partitions. Note that this example does not represent any of the 
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FIGURE 4. 
The Allowable Values of FFp as a Function of |Mp| 
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real programs ue tested, since in those cases, the Minimum number of 
references in any sector trace was over 1/2 million. Even though this 
example does not represent aw actual program, it does indicate that, even 
when 2/3 of this program can fit into primary, memory, there is a wide 
variation in its paging behavior over sector partitions. It also 
i I lustrates that there are simple sector traces where the bounds given by 
Theorems 1 and 3 are simultaneously the greatest lower bound and the 
smallest upper bound, respectively. 

Example of Results: 

Consider a virtual memory system ui th parameter st 

|Hp| - 2. 

k=3 sectors per page. 

F =• demand or F d . 

R - LRU, or R LRU . 

Program =» labcdefghil, a set of 9 relocatable 

sectors of size N/3. 
ST = aehae hbdgb dgaeh bficf ibeha dgadg. 
1ST | - 38. 
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Applying Theorem 1, ue compute FFs( |Hs|-6,5T,Fd,Ro) : 
ST - aehae hbdgb dgaeh bficf ibeha dgadg 
F d » aehBB BbdgB 89888 8fic8 8888a dg88B 
R - 88888 88888 88088 8dga8 8888c fi888 
t1 8 * aehae hbdgb dgaeh bficf ibeha dgadg 
aeha ehbdg bdgae hbfic figeh adgad 
aeh aehbd gbdga ehbfi cfibe hadga 
aehh hhbdg aehbb bcfib ehhhh 
aee eehbd gaehh hhcfi beeee 
aa aaehb dgaee eehcf ibbbb 
Theoretical minimum - 12/3 - 4 page fetches. 
****************************************************** 

Applying Theorem 3, we compute FFs ( |f1a | -2,ST,Fd,R^Ru ): 

ST ■ aehae hbdgb dgaeh bficf ibeha dgadg 
F d » aehae hbdgb dgaeh bficf ibeha dgadg 
R LRU => 08aeh aehbd gbdga ehbfi cfibe hadga 
fig - aehae hbdgb dgaeh bficf ibeha dgadg 
aeha ehbdg bdgae hbfic fibeh adgad 
Theoretical maximum = 38 page fetches. 
******************************************************* 

There are: 9! - 288 distinct ways of 
(3!) 9/3 (9/3)7 

reordering the 9 relocatable sectors into 3 pages. 
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Consider fl, - labc def ghil where page A - abc , etc. 
Now «e compute FFp(|Hp| - 2, H| ,ST,Fd,fl[jHj ). 
For ST - (aehae hbdgb dgaeh bftcf ibeha dgadg), we get 

P - ABCAB CABCA BCABC ABCAB CABCA BCABC 
F d - ABCAB CABCA BCABC ABCAB CABCA BCABC 
R LRU = 88ABC ABCAB CABCA BCABC ABCAB CABCA 
«{, - ABCAB CABCA BCABC ABCAB CABCA BCABC 
ABCA BCABC ABCAB CABCA BCABC ABCAB 

FFp - l\ mi |fj| | - 38 page fetches for fl ( - theoretical 
Maximum. 



Consider U 2 - idag beh cfil, where page A • dag ,- etc. 
Now we compute FFp(|Mp| - 2, H 2 ,ST,Fd,R tRU ). 
For ST - (aehae hbdgb dgaeh bficf ibeha dgadg) , ue get 

P - ABBAB BBAAB AAABB BCCCC CBBBA AAAAA 
F - AB888 88888 88888 8C888 8888A 88888 
R ~ 88888 88888 88888 8A888 8888C 88888 
nj, - ABBAB BBAAB AAABB BCCCC CBBBA AAAAA 
AABA AABBA BBBAA ABBBB BCCCB 6BBBB 
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FFp = I 3 °, |fj|= 4 page fetches for U 2 - theoretical 
mini mum. 
A********************************************************* 

In the above example, the theoretical minimum value of FF p = 4 
from Theorem 1 and the theoretical maximum value of FF p - 30 from 
Theorem 3 were found to be the greatest lower bound and the smallest 
upper bound respectively over all partitions, II . 



3.5 Extensions to Lower Bounds 

In section 3.2, lower bounds were derived for the case where each 
page contained at most k sectors. In this section, we would like to 
relax this constraint. 

Uhat were the problems associated with the constraint that pages of 

a partition must contain at most k sectors? There are no problems when 

the sectors are all the same size. However, when the sizes of the 

sectors vary considerably, it becomes more complex to determine the beat 

k. For example, if one chooses k to be, the maximum number of sectors 

which could fit into any page, then the set of all partitions are 

allowable, but the value of 

FFsUMsMnpl *k.5T.Fd.Ro) 
k 
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night not produce a bound which is as tight as us can produce. This is 
due to tuo reasons. First, since |fl3|»|l1p| * k, the size of Ms might be 
larger than necessary to always hold the sectors prevent in pages of Up. 
Note that some pages of ftp might hold fewer than k sectors and that FFs 
is a monotonical ly decreasing function of ttts). Second* perhaps we can 
reduce the divisor k uhen some pages must contain fewer than k sectors. 

On the other hand, if one chooses k to be some value less than the 
maximum number of sectors uhich could fit into a page, then some of the 
partitions are not considered. 

Ue Mill nou consider all partitions of relocatable sectors into 
pages. The only constraint is as before, 

Z |Sj| < N for all i, uhich simply 

states that the size of any block of the partition in bytes must be less 
than the page size, N, in bytes. Note that this set of all partitions ic 
the same as the set of partitions when k is chosen equal to the maximum 
number of sectors uhich could physical ly fit Into a page. However, Mm 
uilt find tighter bounds. 

Consider a program which consists of m relocatable sectors of 
various sizes. Ue define the "sector size vector", SS, to be a sequence 
of sizes of these m sectors, SS - |S| | , |S 2 I ,... , fSm|, such that 
|Si| < |Sj| for all i < j, 1 & j,i < m, where |Si| is the size of Si in 
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bytes. Recal I that: 

| Up | is the number of page frames in the 

paged memory, Mp. 
N is the page frame size in bytes. 

Now ue define a function, fj , in terms of |Hp|, N, and SS: 
f! (|Mp|,N,SS) » the maximum number of sectors of sizes in SS which can 
be packed into a set of | Mp | page frames of size N bytes each, when 
sectors are not alloued to cross page boundaries. 

Example. 
Let: 
|S, | - |S 2 | - |S 3 | - leee bytes; |S 4 I - 2888 bytes; 

|S 5 I - |S 6 I - 3000 bytes. 
N = 4080 bytes 
then, 

f, (l.N.SS) - 3 
f, (2.N.SS) =5 
f, (3,N,SS) - G 

Since the computation of f t can become a complex combinatorial 
problem in itself, ue will give an easy method of computing an upper 
bound for f| . 
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The function - f" is defined in terns of |Mp|,N,SS as follows. 
f u , (|rip|,N.SS) -ft if and only if 
2*1 |Si| > l«Pl*N and 2 W -J\ fSi I < i«Pt 

It should be clear that f, (|Hp|,N,S8) < t\ <|f1p|,N,SS) for all 
|Hp|,N,SS. For the above example, 

f u , (1,N,S) - 4 

f u , (2,N,S) - 5 

f u , (3,N,S) - B. 

Let us interpret a particular form of fj ; that Is, if |f1p| - 2, 
then f| (2.N.SS) is by definition the maximum number of sectors which 
can be packed into 2 page frames of N bytes each. 

Ue can use f, (|f1p|,N,SS), f, (2,N,SS> and the sector fetch 
function, FFs, to lower bound the page fetch function, FFp, as follows* 

Theorem 4. 

Given any two- 1 eve I virtual memory system V, with page size N, 

primary memory size |Hp|, any valid page replacement a Igor i thm Ra, demand 

page fetch Fd, and any sector trace STa, then for any partition TJa of thm 

relocatable sectors into the logical pages of the program, the minimum 

number of page fetches given by the page fetch function FFp is lower 

bounded by 

FFp(|l1p|.N, IIa,STa,Fd,Ra) > (FFsHftsI - f, Uftol.N.SSl.ST.Fd.Ro) ) - A, 

f, <2,N,SS)/2 
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where A ~ 2f| (l,N,5S)-f, (2.N.SS) , and 

f, (2.N.SS) 

where the value of the sector fetch function FFs is the number of sector 

fetches which occur in a two-level virtual memory system V, with primary 

memory size |Ms| = f t (|np|,N,SS), using demand fetch Fd, optimum 

replacement algorithm Ro, and the same sector trace ST - STa. The 

function f| is as previously defined, and SS is the sector size vector. 

Corol lary 4a 

FFp(|Mp|,N, Ila.STa.Fd.Ra) > (FFsdMsj - f, ( |Mp| ,N.SS) .ST.Fd.Ro) ) -1 

f, (2,N,SS)/2 



Corol I ary 4b 

FFp(|Hp|,N, na.STa.Fd.Ra) > (FFsdRsI =» U, .ST.Fd.Ro) -!, 

U 2 /2 

where U, equals either f, (|f1p|,N,SS) or f u , (|np|,N,SS) ( and U 2 

independently of U, equalB either f t (2.N.SS) or f" (2.N.SS). 

Corollary 4b says that we can lower bound FFp in terms of the easily 

computed function f" . 

Corol I ary 4c 

FFsdHsl = IMpI *k. na.STa.Fd.Ro) < FFsdHsl - f, (inpl.N.SS) .ST.Fd.Ro) 
k f, (2,N,SS)/2 

Corollary 4c states that the bounds given by Theorem 4 may be tighter 

than the bounds given by Theorem 1 where k is the maximum number of 

sectors which can physically fit into a page. 



Proof of Theorem 4 
Notation and properties 

Let ST 8 - x 1 ,x 2 ,...,x L where x' is the sector referenced 
at time t. For virtual memory system V and FFp, let: 

1. II, - HI, , n z E„l be any partition of sectors into 

the n logical pages of the program where each page contains any 

number of sectors such that 2 fSj| £. H for 

S ,cHi 

1 < i < n. 

2. P - (p 1 ,p 2 p l ) be the resultant page trace computed 

uniquely from ST and Ila , such that. If x* « Hj , then p' - j. 

3. Fa - f's.f 2 , ,...,f^ be the demand fetch policy, where 
f ^ e na and f[t Mp*" 1 and \f[ | - 1 or ft, the 

number of pages in fj, . Mote that we have chosen to denote Fd 
for FFp by Fa to avoid notational conflict with the Fd for FFa. 

4. Ra - r 1 , ,r\ ,...,r l , be any removal policy where 
r[ c Ila and r[$ ttp M and \r[\ - 1 or ft, the 
number of pages in rj, . 

5. nj, be the set of pages in tip at time t and tf p - 8. 

6. nj,- (Mp M U f* )-ri . 

First we prove Lemma 5. 
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Lemma 5: 

There exists valid demand fetch and removal policies, Fd and Rd, for 

the FF9 model 9uch that 

FFp(|f1p|,N, na.STa.Fd.Ra) > FF3UM3I « f, (IUdI .N.SS) .ST.Fd.Rd) - A , 

f, (2.N.SSI/2 

where A - 2f, (l,N,5S)-f, (2. N.SS) 

f, (2,N,SS) 

Proof: 

For the FF9 model, Fd and Rd Mill be constructed by forming a 
sequence of valid replacement and fetch policies 
(F, ,R, ) , (F 2 ,R 2 ) , • . . , (Fh.Rh) , uhere: 

1. F, =f'|,f 2 | f L , and f', = g(f|) = 

the set of sectors making up the page in f 8 , for 1 < t < L. 

2. Similarly R 1 « r '| , r* , . . . ,r | and 



r 



gir], ), for 1 < t < L. 



3. Fh - Fd - f d ,f d ,...,f d and 

R h - R d - r d , r\ r L d , f or 1 < t < L where 

f d « r d -8 if x'f OV ; f d - x' and 

r d - 8 if xV "V and |nV |<|f1s|; 

f d = x' and r d = Ibl c lf d ' if 

xV nV and |flV I - I Ms |; and 

M d - (fl'j 1 U f d )-r d to satisfy demand 

sectoring. 
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Since |f1s|-f, <|Mp|,N,SS) > ||Hp||, Lemma 4a says that the above 
construction exists such that 

Z\.t \f\ I > 2\.i Ifdl 
Therefore, ue have Fact 1: 

Fact 1. 

2m.i |fil>2\.i Md I - FF 9 (|fl9|-f, (|Hp|,.N,SS),ST,Fd,Rd). 
Nou, let's prove Fact 2. 

Fact 2. 
2 L M If* I i ((f i (2,N,SS)FFpMHpi,N, na t STa f Fd,Ra)+f, C2.N.SS) *A ))/2 

Proof. 

I'm IfVl " s m l9(*i)|- A.\ 1 .*• • ■ • 

since | f [ | - 1 iff |g(fi)| > 8 and | f [ | - 8 iff 

|g(fi)| =8. 

Note that IgtfiU is the number of sectors in the page spec i fi ed by f, 

Also, note that 2^., | f[ |«FFp(|f1p|,N, Ila,STa,Fd,Ra) . 

Now let's compress Fa - f 1 , ,f\ ,...,f, to get 
F' M - f', 1 ,\'l f'i' by taking out all the t[ - 8. 
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Clearly I 1 }., If^l- Z L ;.i KM and 
A.\ |flllg«fi>l- SS., |f. M ||g(f?H- Z\., If 1 ,!. 

Furthermore, note that, under the definition of demand fetch, no tno 
successive page fetches can be to the same page. This is obvious, since 
under demand fetch a page is fetched and is Kept in primary memory until 
i t has to be removed to make room for another page. 

Therefore no tuo successive values glf,' ) and g(f'J* ) can 
be the same. 

Now, the sum l\' ml | f j' ||g(fg)| is clearly maximized 

if, for all odd t, | g C f ^* ) | is equal to the maximum number of 

sectors which can fit in a page, and if, for all even t, |g(f a ) I is 

equal to the next maximum number of sectors uhich can fit in a page. 

Thus, 



A.\ If* 1= iKi iCllg'fi* M < *w Itf I f. (2,N,SS) for even L\ and 

2 



*\.i If. I - 2«-i IV Mgtf. H < 2 L ,:{ |f 8 ' I f, (2,N,SS) + lf. L |f, (1,N.SS) 

2 



for odd L*. 
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Note that f, (l,N,SS)-f, (l.M.SS)+ f, (2.N.SS) - f. (2.H.SS) . and thus 

2 2 



2\.| M i I < *■ (2.N.S5) l\' mi If? | +f, (l.W.SS)- f, (2.W.SS) 

2 2 



for alt L*. 

Hence, l\ mi \f\ | 

< (f, (2.N.SS) FFp(|Mp|.N, Ila,STa,Fd,Ra) + <2f, <l.K,SS)-f, (2.N,SS))/2 
and Fact 2 is proved. 

From Fact 1 and Fact 2, urn have 

FFp(|Mp|,N, IIa,STa,Fd,Ra! > FFsiltlsj - f, UHttl.N.SSl.ST.Fd.Rd) - A 

f, C2,*,SS}/2 

This proves Lemma 5. 

Nou, from Lemma 1, ue knot* that, 
FFs(|M 9 | - f, (|Mp|,M,SS>,ST,Fd,Rd) > FFs(fris| - f, t|Hp|,N,SS) .ST.Fd.Ro) 

Therefore, Theorem 4 follous immediately. QED. 

Proof of Corollary 4a: 

It follous immediately from the fact that 

< 2f, (l.N.SS>-f, (2.N.SS) < 1. 
f, (2.N.SS) 
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Proof of corollary 4b: 

Corollary 4b follous directly from Theorem 4 and Lemmas 2, 3. 
Lemmas 2, 3 give, 

FFs(|ns|=f, (|Mp|,N,SS),ST,Fd,Ro) > FFs( |Hs|-U, ,ST,Fd,Ro) 
since U, > f, (|Hp|,N,SS). The divisor goes through since 
W 2 > f, (2.N.SS). 

Proof of corollary 4c: 

Corollary 4c follous from Lemmas 2, 3 since 
|f1p| *k > f, (|f1p|,N,SS), and k > f, (2,N,SS)/2. 

To compute the lower bound of Theorem 4, simply make one sector 
simulation run through the sector trace and record the number of sector 
fetches for each possible sector memory size. Then for a particular 
value of |Hp|, use f| (|f1p|,N,SS) to select the proper value of FFs and 
divide by f| (2.N.SS) to get the bound. 

If the objective is to loMer bound FFp over all partitions, then 
Theorem 4 may give tighter bounds than Theorem 1 if the range of sector 
sizes is large. For this is the case when f, (|Mp|,N,5S) < k* |Mp|. 
Furthermore, f! (|f1p|,N,SS) can become substantially less than k* |Mp| 
for large values of |Mp|. The term, f| (2,N,SS)/2, in the loner bound 
is the average value of k for the two pages having the largest number of 
sectors. Ue cannot extend this average over all pages, since every other 
page fetch could be to the page containing the largest number of sectors* 
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while every intervening fetch could be to the page having the second 
largest number of sectors. Even if all pages are fetched, if the above 
behavior occurs sufficiently often in the execution of a program, then ue 
still cannot average over all pages. 

Is there any uay to compensate for the case when some sectors are 
much larger than others? For ease in the fol toning discussion, let the 
average vaue of k for the two pages having the largest number of sectors, 
f, (2,N,SS)/2, be denoted by k', and let the average size of these 
sectors be denoted by N/k*. In order to illustrate some typical values 
one may encounter. Me point out that for the real programs ue 
investigated, the values of k' Here on the order of 3 to 6, and, hence, 
N/k' was 1/3 to 1/6 of a page for a page size of 4B96 bytes. Nou let's 
assume that ue are given a particular program, Q, and ue compute the 
value of N/k* and find that there are several sectors uhose sizes are 
considerably larger than N/k'. Now consider what happens if ue break up 
these large sectors into as many subsectors as ue can uithout increasing 
the value of k*. This new program with the large sectors replaced by the 
smaller subsectors is called Q* . Given Q* , it is still quite easy to 
compute a sector trace over Q* from the address trace. Ue call such a 
sector trace ST*. Using this sector trace, ST*, and the program, 
Q* , ue can apply Theorem 4 to compute the lower bound on the page fetch . 
function, FFp, over all partitions, Ila* , of sectors of Q* into 
logical pages. Ue present two important observations on this lower 
bound: 
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A). Th't3 loner bound is valid over all partitions of sectors of 
Q* into pages. Therefore, the lower bound i9 certainly true for all 
the partitions over Q* that are constrained to comply with Q. That is, 
if a page in a partition of Q* contained one subsector of a sector, 
then it would have to contain all the subsectors of that sector. This 
restriction on the set of all partitions over 0* simply produces the 
9et of partitions which result when reprogramming is not allowed. 
Let liar* denote any such restricted partitions of Q* . 

B). This lower bound using ST* and liar* over Q* is probably 
much larger for most real programs than the lower bound computed by 
Theorem 4 using ST and Ila over Q. The rationale for this is simply that 
it will take several subsector fetches to bring into the sectored memory 
the same information that could be fetched by one large-9ector fetch. 

Observation B need not necessarily be true; that is, the lower 
bound which results from breaking up the large sectors could 
theoretical ly be smaller than the lower bound computed by not breaking up 
the large sectors. However, this presents no practical problems. Since 
both methods will produce valid lower bounds, we simply compute both and 
U3e the greater lower bound. In our analysis of real programs, we found 
that the lower bound computed from breaking up the large sectors was 
substantially larger than the lower bound computed when the large sectors 
were not divided. 
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Ue Mill now formalize the notions of (F and ST* and define the 
relationship between Q* and Q* and between ST and ST*. Then, Theorem 
5 is presented, which states that the page fetch function, FFp, Is lower 
bounded in terms of the sectoring behavior given bg ST*. 

Let, Q - Q t U Qg-tset of ■ relocatable sectors of any program} 

uhere Q| » IS| ,S 2 »S(, 1 , Q 2 ««IS|, t |, S),, 2 , . . . , Sml 

such that f, (2,N,SS)/2 - k/2 and |Si| ± |Sj| for al I Si e Q t and 
Sj t Q 2 . 

Let, SS - |r, |,|r 2 | |r k |,|r M | \r m \ be the sector 

size vector of Qs that is, r^ Q and (r; | s |rj | for i s j and 
|r m | < N, the page size. 

Note that |r„ | is the size of the largest sector in Q| . 
Furthermore, note that the above construction is always possible. 

Now, ue break up the large sectors of Q into subsectors. Let 
Si - (Sii*l for 1 < i < k and 

Si - (Si* ,Si 2 ,...,Si* I for k<i < * such that 

* i 

|r„| < ISi j* | 

This last constraint is sufficient to guarantee that 

(f, (Z,N,SS))/2 . - k/2 does not change because of the small subsectors. 

In practice, one could choose |Sij*| - |r k | for 1 < j < lj and 

lr k | < |Sij*| < 2|r k | for j - l f . 
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Now define, Q* » Q* U Q*> - iset of m' relocatable sectors 
of the same program) 
where Q^ = IS*n , S^.-.-SJ^, } and 
^2 " l$l*\,\ '$k*\,2 » • • • »Sk»t,i^ . ••• fS^i tSJ,. |2 ,...,S m > i \J. 

Let, SS* - |r* t |,|r 2 | |r*. | be the sector size vector 

of QT, |r* | < |r* | for I < j. 

Note that (f, (2,N,SS))/2-(f, (2.N.SS* ) )/2. 

Given any address trace, A, and the sector ordering of the programs 
Q and Q* for that address trace, ue can easily compute: 

ST - S 1 ,S 2 S L for Q and ST*- S^.S* 2 , . . . .S* 1 for Q* , where 

S* e Q and S**e Q* . 

Note that, i f S* 1 - Si j* then S' - Si for 1 < t < L. 

Thus, we can also compute ST from ST*. 

Theorem 5 is presented in terms of the above definitions of Q* and 

ST*. 

Theorem 5. 

Given any two-level virtual memory system V, with page size N, 
primary memory size |Mp|, any valid page replacement algorithm Ra, 
demand page fetch Fd, and any sector trace STa, then for any partition* 
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Ila, of the relocatable sectors into the logical pages of the program, Q. 
the minimum number of page fetches given by the page fetch function, 
FFp, is lower bounded by 

FFp(|np|,N f na,STa,Fd,Ra) > FFsdHsl - f. (IMpI .N t SS* ) .ST - STa*.Fd.Ro)- A, 

f, (2,N,SS)/Z 

uhere A - 2f, (l.N.SS) - f, (2.N.SS) . 
f, (2.N.SS) 

and where the value of the sector fetch function FFs is the number of 

sector fetches which occur in a two-level virtual memory system V, uith 

primary memory size |Ms| - f, ( |f1p| ,N,SS* ) , using demand fetch Fd, 

optimum replacement algorithm Ro, and sector trace ST = STa* . The 

function f, is previously defined, SS is the sector size vector of Q, 

and SS* is the sector size vector of Q* . 

Proof: 

Let Q, ST, U* and ST* be exactly as defined immediately before 
Theorem 5 was stated. 

Let IIa*= I n* , 11*, n; I be any partition of the 

relocatable sectors of Q* into logical pages, where page k - II*. 



for 1 < k < n and I |Si j* | < N. 

sr,tn; 



Applying Theorem 4 to Q* gives by simple substitution, 

FFp(|np|,N,na*,Sr,Fd,Ra) > FFsdflsl = f, ( IMbl .N.SS* ) .ST* .Fd.Ro) -A, 

f, (2,N,SS*)/2 

and since f, (2,N,SS*)/2 - f, (2,N,SS)/2 we get 

FFp(|Mp|,N, Ila'ST* ,Fd,Ra) > FFs(|n 3 | - f, (|Mp| ,N.SS* ) .ST* ,Fd,Ro) -A. 

f, (2,N,SS)/2 
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Let II.a = ill | , n 2 , . . . , Iln) be any partition of relocatable sectors 
of Q into logical pages such that Z |Si| < N and Si c Q, 

s je n k 

where page k <= Ilk for 1 < k < n. 

Given any Ila, then we construct liar* as follows: 

IIar*= lllr*, , Ur" 2 Ilr*, I such that, 

for a I I Si e Ilk, Si j*c Ilrk* 

for 1 < k < lj and page k - Ilrk* for 1 < k < n. 



Now, 

FFp(|Hp|,N,nar*,ST* .Fd.Ra) > FFsdtlsl - f, ( I tip I .N.SS* ) .ST* .Fd.Ro) - A, 

f, (2,N,SS)/2 

since the set of all liar* is a subset of the set of all Ila*. 



Now we prove that 

FFp(|Mp|,N, nar*,ST*, Fd.Ra) - FFp(|Mp| ,N, Ila.ST, Fd.Ra) 

Ue need to show that the page trace 
P* = p*',p* 2 , . . . ,p* L , computed from liar* and ST*, is the 
same as the page trace P - p 1 ,p z ,...,p' , computed from Ila and ST. 
Let sr-S* 1 ,S* 2 ,...,S* L and ST-S 1 ,S 2 S L . 

Let the sector referenced in ST* at time t be S for 1 < t < L, 
Then S*' » Sij* for some 1 < i < m' and 1 < j < \; , 



97 
and Si j* « Hrk* for some 1 < k < n. Hence, p M - k. 

Given S* 1 - Si j* , then S' - Si, and, given Sij*t Ilrk* , 

then Si * Ilk. Hence, p' « k, and we have p** - p' for 1 < t < L. 

Therefore, 

FFp(|Mp|.N, nar*,Sr,Fd,Ra) - FFp(|np|,N, na,ST f Fd,Ra) and 



FFp(|rip|,N na.ST.Fd.Ra) > FFsMUst - fi UHpl.H.SS* F.ST* .Fd.Ra) - A, 

f, (2.H.SSI/2 

QED. 



The following simple example is given to illustrate that Theorem 5 
can produce a tighter bound than Theorem 4. This example is made as 
simple as possible such that the mechanics of applying Theorem 5 can be 
presented. 

Example: 

Let Q « IS, ,S 2 ,...,S| 2 I where |Si | - 1080 bytes for 
1 < i < 8, and |Si| - 4008 bytes for 8 <i < 12 and N - 4000 bytes. Nom 
let's divide Si for 8 <i < 12 into four parts, each being 1000 bytes 
long; i.e.. Si becomes ISi t ,Si 2 ,Si 3 ,Si 4 I where 
|Si j| - 1000 bytes for 1 < j < 4. Thus, 

Q' « f S | ,Sg ,S3 , . . . ,Sg ,Sg|,Sg2,Sg3,Sg4, . . . *S|2,| iS|2,2» Slj^t Sf^r* 
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Let ST' = S| ,S 2 ,S 3 , . . . ,Sg , Sg| ,Sg 2 ,Sg3 .Sg^ ,..., S|2,| tS| 22 ,S|2,3. S| 2> 4. 

This represents the compressed reference 

behavior of one pass through Q' where every 

unit of Q" i9 touched. It is reasonable to assume that such sector 

behavior could represent one pass through a small loop of a much larger 

real program. 

Nou, ST=S| ,S 2 ,S 3 , . . . , S 8 ,Sg ,Sg ,Sg t S g , . . . ,S ]2 ,S| 2 ,S| 2 , S| 2 . 

Evaluating FFp( |f1p| ,N=4B08, Ila,ST,Fd,Ra) , gives G page fetches when 

II i = <S|,S 2 ,S 3 , S 4 ) , II 2 - IS 5 ,S 6 ,S 7 ,Sg 1 and 

Ili = ISi+61 for 2 <i < G, and |Mp| and Ra take on any values. It 

should be clear that this partition minimizes FFp. 

Theorem 4 gives a louer bound for FFp of 

FFsdtlsl = f, (IMpl.N.SS).ST.Fd.Ro) - A = (12M)-0 * 3, 
f, (2,N,SS)/2 

for all values of |Ms|. Note that f, (2.N.SS) -8 and 

fj (1,N,SS) - 4, hence A => 8. Theorem 5 gives a lower bound for FFp 

of FFsCirisI - f, (IHpl.N.SSM.Sr .Fd.Ro) - A - (24/4) -B - G, 
f, (2,N,SS)/2 

for all values of |Hs|. ThU9 f Theorem G gives the greater lower bound, 

and it is a factor of 2 better than the bound given by Theorem 4. 
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Now we extend Theorems 4 and 5 to include the cases where sectors 
can be any size, and we let the sectors cross page boundaries. 

He now present Theorem G which lower bounds FFp over all sector 
order ings SO into the n-page logical address space. The sectors can be 
any 3ize and may cross page boundaries. This model corresponds to the 
case where sectors are clustered together into groups and then these 
groups are packed into the virtual address space. 

Since sectors may cross page boundaries, one may not be able to 
determine the page trace from the sector trace ST. Ue define SOT to be 
the sector trace consisting of ordered pairs of elements: 

SOT - (S 1 ,0' ),(S 2 ,0 2 ) (S L ,0 L ) where S f is the 

sector referenced at time t and 0* is the offset in S* referenced at 
time t. Given a sector trace SOT and a sector ordering SO as defined in 
Chapter 2, the page trace follows immediately. 

Note that SOT* is exactly the same as ST* except that the 
elements of SOT* are simply ordered pairs. Also note that the 
construction of Q* is not affected by allowing sectors to cross page 
boundaries. 
Theorem G. 

Given any tuo-level virtual memory system V, with page size N, 
primary memory size |f1p|, any valid page replacement algorithm Ra, 
demand page fetch Fd, and any sector trace SOTa, then for any sector 
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ordering SOa, of the relocatable sectors into the logical address space 

of the program Q, the minimum number of page fetches given by the page 

fetch function FFp, is lower bounded by 

A. 

FFp(|f1p|,N,S0a,S0Ta,Fd,Ra) > FFs(|t1s| - f", (|Hp| .N.SS) .ST - SOTa.Fd.Ro) - A 

f u , (2,N,SS)/2 

and by 

B. 

FFp(|rip|,N,SOa,SOTa,Fd f Ra) > FFs(|ns| » f u , ( |Mp| .N.SS' ) .ST - SOTa* .Fd.Ro) - A 

f°, (2,N,SS)/2 

Hhere A - 2f", (l,N,SS)-f u , 12. N.SS) . 
f u , (2, N.SS) 

and Mhere the value of the sector fetch function FFs is the number of 
sector fetches which occur in a two-level virtual memory system V, Mlth 
primary size |f1s|, using demand fetch Fd and optimum replacement Ro, and 
sector trace ST - SOTa in part A and ST - SOTa* in part B. 



Proof of Theorem G: 

Let SOT, - (S 1 .0 1 ),(S 2 ,0 2 ),..., <S L ,0 L ), where S' is 
the sector referenced at time t and 0* is the offset. For virtual 
memory system V and FFp, let: 
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1. SOa be any sector ordering of the relocatable sectors in the n 
pages of the address space of program Q. 

2. P - p 1 ,p 2 ,...,p l be the resultant page trace computed 
uniquely from SOTa and SBa, such that p' - CLCS' J+0* )/N. 

3. Fa - f',,f »,*.., f, he the demand fetch pel icy, where 
fj, - lp' I or 8; f\ n Up'" 1 - 8. Note that we have 

chosen to denote Fd of the FFp model by Fa to avoid notational 
conflict with the Fd of the FFs Model. 

4. Ra - r'j ,r 2 , ,...,r^, be any removal policy under demand 
fetch, where r[ £ Mp M smd \r[ f-1 or 0. 

5. Pip* - (ttp M - r*) U f[ and V- *• 



First ue prove the following lemma. 

Lemma 6: 

There exists a valid demand fetch and removal policy, Fd and Rd, 
for the FFs model such that 

FFp(|l1p|,N,S0a,S0Ta,Fa.Ra> > FFs<|ttet - ft Uffe* J*.SS>.SQTa.Fd.Rd) ~ A, 

ft C2,W,S»/2 

Mhere A • 2f u , (l,N,S)-f u , (2.N.S5K 
f°, <2,N,SS) 

For the FFs model, Fd and Rd will be constructed by forming a valid 

sequence of replacement and fetch policies 

(F, ,R, ) , IF 2 ,R 2 I ...... CF h ,R h ) , where: 
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1. F, - f 1 , , f 2 , f L , , and f', - g(fi) = the set of 

sectors having any of their parts in f, for 1 < t < L. 

2. Simi larly, R| =» r 1 , ,r 2 ! ,... ,r L ! , and 

«"' = g(i"a) = the set of sectors having any of their parts in 
r[ for 1 < t < L. 

3. Fh » fd = f' d ,f d ,...,f d , and 

A. Rh = Rd = r' d , r d f . . . ,r d , for 1 < t < L, Hhere 
f d - r d = 9, if x' t nd'-' ; f d - x' and 
r d -8, ifxVnd'" 1 and ind M |<|Ms|; 
f d = x* and r d - Iblc fid'-' , if xV f1d M and 
Hid'- 1 |-|Ms|; and fid' - (Md'-'-r^U. f d to 
satisfy demand sectoring. 

Lemma 4a is still true for this case when sectors may cross page 
boundaries. The proof of Lemma 4a when sectors are alloued to cross 
page boundaries is exactly the sane as before except that ue add the 
following to the proof. (Recall that z' is the sector referenced at 
time t.) 

If it ever occurs that z'e fj_| and z'e Mp'" 1 , then 
simply remove z' from fj_| . Thi3 only reduces the value of 
| f J | and it keep9 sector z' from being added to the deferral sector 
list uhen z' is in the sectored memory. 
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Since |fls| - f u , MHpl.N.SSJ > | HlpiU Lemma 4a says that the 
above construction exists such that 

2t - I I * i I > S\ - i I fi |-FFsJ lltsl-f, ( |Wp| ,N,5S) ,SOT, Fd,Rd) 

Fact 3 



S 1 }., | f*, | < (f u , t2,N,5S)FFpMrtpt,N.S0.SOT.Fd.Rdn + f u , (2.N.SS)* A 

2 2 



The proof of Fact 3 is exactly the sane as Fact 2 of theorem 4 
except that |g(fl>| becomes the number of sectors having any of 
their parts in fj . 

Hence Lemma 6 is true. Lemma G and Lemma 1 prove part A of the 
Theorem. 



Proof of part B. 

Given any address trace A and any Q, construct Q* , SOT, and 
SOT* exactly as in Theorem 5, except denote the elements of SOT and 
SOT* as ordered pairs. 

The proof of part B is almost exactly the same as the proof of 
Theorem 5. Ue point out the exceptions belpu. 
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Instead of applying Theorem 4 to Q* as in Theorem 5, ue apply 
part A of Theorem 6 to Q* and use the fact that 
f u , (2, N.SS) - f u , (2, N.SS*) to get 

FFp(|rip|,N,SOa*,SOT*,Fd,Ra) > FFsUtlsl-f", (IMpI .N.SS* ) .SOT* .Fd.Ro) - A 

f u , (2,N,SS)/2 

In the proof of Theorem 5, we restricted the 9et of Ila such that 
subsectors could not be in different pages. Here He restrict the set 
50^ °f a '' SOa* to get the subset SOJ^ . Let x c SOa « 
then x (■ S0 AR if the subsectors of each sector in x occur together 
as a subsequence of SOar* , and if the subsectors of each sector are 
ordered in the subsequence as they occur in the sector. Ue are simply 
restricting the set of all SOa* such that ue get the set of all SOa 
uhen the common subsectors of each subsequence of each SOar* are 
concatenated together. 

Since the above result, FFp > FF9, is true for all SOa*, it must 
be true for any constrained subset of SOa*. In particular it mu9t be 
true for all SOar*. Thus 

FFp (| lip |,N, SOar*. SOT* .Fd.Ra) > FF 9 t|n 3 | - f", ( I tip I .N.SS* ) .SOT* .Fd.Ro) - A 

f u , (2,N,SS)/2 

Now we need to show, as in Theorem 5, that the page trace P* 
computed from SOar* and SOT* is the same as the page trace P 
computed from SOT and SOa. This i3 obvious from the construction of 
SOar* and SOT*. That is, P* 1 computed from (S* 1 ,0*' ) 
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and SOar* must be the same aa P* computed from (S 1 ,0* ) and SOa. 
Thus, FFp(|np|,N,SOa,SOT,Fd,Ra) = FFp( |Mp| ,N, SOar* , SOT* ,Fd,Ra) 
and the proof of B follows immediately. QED. 
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3.6 Bounds for Working Set Management 

Theorems 1-5 give upper and lower bounds on the number of page 
fetches required to execute a program in any fixed primary memory size. 
However, there are paging algorithms which exploit the important program 
property of locality by attempting to dynamically allocate various 
amounts of primary memory space to a program as it executes. Recall 
that, intuitively, locality means that during a given interval of 
execution a program addresses only a subset of total addressable space. 
However, for different intervals, the size of this subset may vary. 
From this notion of locality comes that of "working sets", and a theory 
of primary memory based on this notion has been proposed and extensively 
investigated in [D1,D2,D31. Therefore, we will extend our definition of 
the page fetch function, FFp, to include working set memory management. 

In order to incorporate the page working set concept into the 
methodology we adopted in Chapter 2 for presenting paging algorithms, 
recall the following definitions. Assume that: 

Q = IA,B,..I is a finite set of logical pages. 

P - p' ,p 2 p L is a page trace with pU Q. 

Up' & Q is the contents of flp at time t. 

F = f' , f 2 , . . . , f L is the page fetch policy. 

R = r ' ,r 2 , . . . ,r L is the page replacement policy. 
A paging algorithm based on the page working set principle is defined as 
f ol lows. 



187 



a. Up(8,T) = 4> 

b. Mp* = Up(t.T) and IMp' | - wp(t.T), 9 < t < L 

c. f' = ♦ if p' t Up(t-1,T) - Up'-' , 1 < t < L 

d. f' - p l if pV Up(t-l.T) - Up'' 1 , 1 < t < L 

e. r* - Up(t,T)-Up(t-l,T){ note that |r' j < 1, 1 < t < L 

Thus, we see that under a page working set strategy, the contents of 
primary memory at time t. Up', is simply the working set, Up(t,T), and 
that the amount of primary memory allocated to a program expands and 
contracts as the working set 3ize wp(t,T) expands and contracts. A page 
reference at time t, p' , causes a page fetch into primary memory if 
and only if p is not in the working set at time t-1. Note also that a 
page is removed from primary memory at time t if and only if it is in 
the working set at time t-1 and it is no longer in the working set at 
time t. 

From the above discussion, we observe that the number of page 
fetches required by a program during its execution using the page 
working 3et memory management technique is uniquely determined from the 
page trace, P, and the working set parameter, T. Therefore, the 
definition of the page fetch function, FFp, under page working set 
memory management can be expressed as a function of the following 
parameters: 
FFp - FFpCIM'pl = wp(t,T),N, na,STa,Up(t,T)) . 
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The parameters in this definition of FFp for working seta are 
identical to those previously presented for the page fetch function, 
FFp, except for two instances. The first parameter, which denotes the 
primary memory size, is equated to wp(t,T) to illustrate that the size 
of Hp varies with the size of working set. The other instance is 
strictly notational, i.e., we have replaced the fetch and replacement 
parameters, F and R, with Up(t.T) to illustrate that the F and R 
policies are those defined for working set memory management. Ue could 
have used Fw and Rw, but we think that Up(t.T) is simply clearer. 

Ue can also extend the definition of the sector fetch function, 
FFs, such that it denotes the number of sector fetches which occur in a 
virtual memory system during the processing of a sector trace under 
sector working set memory management. 

Consider a program whose behavior is modeled by a sector trace, ST. 
Then the sector working set at time t, 14s (t,T), is defined to be the 
distinct set of sectors referenced in the sector trace, ST, during the 
time interval (t-T,T). The number of sectors in the sector working set 
at time t is defined to be the sector working set size and is denoted by 
ws(t,T). The maximum value of the sector working set size for a given 
sector trace is denoted ws(t,T)max. Note that ws(t,T)max < T. Let: 

a. Program » la,b,..l, a finite set of relocatable sectors. 

b. ST =■ S ,S , ...,S , a sector trace with S € Program. 

c. Ms £ Program, the set of sectors in primary memory 
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at time t. 

d. F - f l , f 2 >...,f L , the sector Mch policy. 

e. R - r 1 ,r 2 ,...,r L , the sector replacement policy. 

Then the sector behavior of a program using sector working set memory 
management is defined as: 

a. Us<B,T) - * 

b. Ms*- Wa(t,T) and |1ts' | - uelt.T), * < t s L 

c. f' - * if S* « tts(t-l,T) - Us*' 1 , litiL 

d. f*«S' if SVUs(t-l.T) - fls M , 1 i t s L. 

e. r*- Ustt.TJ-Uslt-l.Ti, 9 < t < L. 

Thus, the contents of primary memory at thw 1 H the sector working oat 
at time t, Us(t.T), and a sector re ferenea at tf*e t causes a sector 
fetch, if and only if SV Us<t-1,TI. Note that the set of sectors that 
are generated by the sector working set strategy to toe in primary memory 
at time t isUstt.TI, no matter what the sizes of the individual sectors 
are. 

The sector fetch function, FFs, for the sector working set strategy 
becomes, 

FFs -FFsUtts' | - wstt,T),ST,M»«t,TH. 
Me observe, as before, that the value of the sector fetch function, FFs, 
which is ths nuaber of sector fetches required to process a sector 
trace, is uniquely determined by the ST and the Halt, Tl parameters. 
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The notion of characterizing the local behavior of a program in 
terms of its sector working set has two potential applications. The 
first is to utilize the tine varing sector working set to identify the 
sectors which should be clustered together in order to minimize page 
faults. This application turns out to be very useful and is discussed 
in full detail in Chapter 4. The second is to find upper and lower 
bounds on the paging behavior, FFp, of programs using the page working 
set strategy in terms of the sector behavior, FFs, using sector working 
set memory management. This approach proves successful for the upper 
bounds but fails for the lower bounds. Even though the approach fails 
to produce lower bounds, Part A of the following theorem points out an 
interesting relationship that can exist between the number of page 
fetches and the number of sector fetches for programs using working smt 
memory management. 



3.6.1 Lower Bounds for working Set Management 

Recall that us(t,T)max is defined to be the maximum value of the 
sector working set size for a given sector trace. 

Theorem 7 

Given any two-level virtual memory system V, with page size N. 
primary memory size |Mp' | • wp(t,T), using paged working set memory 
management Up(t,T), and sector trace STa, then for any parti t ion, Ila, of 
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the relocatable sectors into logical pages of the program where each 
page contains k or fewer sectors, the minimum number of page fetches, 
given by FFp<|f1p 1 | - wptt,T),N, Ha.STa.UpU.Tn, 

A. is not loner bounded by 

FFadrtB* \ - wstt.k, T),ST - STa.Ustt.k, T)) and 
km k 2 

B. is not lower bounded by FFsI(HbJ ■ km T,ST«STa,Fd,RtRu 1 but 

C. is lower bounded by FFsdHsl - km ws(t.T)ma*.ST - STa.Fd.Ro) . 

k 

where the value of the sector fetch function, FFs, is the number of 
sector fetches which occur in a two- level virtual lienor y system V*, tilth 
primary memory size flts|, with the same sector trace ST-STa, using 
sector working set management in Part A, using demand fetch, LRU 
replacement in Part B and using demand fetch, optimum replacement in 
Part C. The value of k ( and k 2 are any arbitrarily large integers 
greater than 1. (The value of fj is as previously defined, and SS is 
the sector size vector.) The value of us(t,T)max is the maximum value 
of us(t,T) over ST. 



Part A of the above theorem states that there are sector traces 
such that the number of sector fetches required to process the sector 
trace is arbitrarily larger than the number of page fetches required to 
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process the corresponding page trace under a good sector ordering. 
Moreover, it states that this i9 even true Mhen the window size of the 
sector working set is made arbitrarily large and the resulting number of 
sector fetches divided by an arbitrarily large constant. Ue claim that 
this is counter-intuitive, because a) if the sector working set window 
size were simply kT, then the sector working set could contain the sane 
number of sectors as those contained in a page working set of size T| 
and b) dividing FFs by k alone would account for the fact that as many 
as k sector fetches are required to bring a page of information into 
primary memory. 

Proof of Part A: 

Ue need to show that there exists a set of parameters such that 

FFp(|np' | - wp(t,T),N, n,ST,U p (t.T)) < FFsCjris' | - wsCt.k, T) .ST.UaCt.ki T) ) 

k * k 2 

Let: 

T = 2 

kj ,k 2 = any fixed arbitrarily large integers. 

m > k| T 

n > k * k 2 

k = 2 

Program = (abxy), a set of 4 relocatable sectors 

each of size S, where S = N/k. 
ST - ((ax) m (by) m )" be the sector trace. 
II > I (ab) , (xy) I where page A = (a,b) and page X - lx,y). 
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P - UAX) B (AX) m ) n - (AX) 2 "* is the page trace. 
Up(8;T) * 14*19, k-, n - 8. 

a. Now, it is clear froe the definition -of Hpft.T) and 
P - 1AX) 2m " that 

FFp(|rtp' | - wp(t,T),W, n .ST.lfp (t,T)) - 2 for at! ■ and n. 

b. Now, to evaluate FFs, 

ST =. liax) m (by)")" implies FFeiine 4 | - ueftvk, n,ST,Ue(t,k, T>) - 4n. 

Proof: 

Part 1. 

Consider the substring reference pat tern iam) M . Observe that the 
first reference to th»9 substring occurs at times t - l+4*i for 
i - 0,1 ...... n - I. W*10,kj T) - 0* by definition. 

wslt.k, T) - Ib.yl for t - l44»i i - l,2,...,n- 1. 

This is true because for each of these times, t, the last 2m references 

uere to b or y. Since 2m > k, T, only b and y can be in Us(t,k, T); 

and since k t T > 2, both b and y must be in Us(t,k| T). 

Hence, for each of the n occurrences of the substring (ax) m in the 

sector trace, exactly two sector fetches are required to bring a and x 

into the working set, where they stay whi I e processing the regaining 

references in the substring, since k| T > 2. 
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Part 2. 

Consider the substring reference pattern (by) m . The first 
reference to this substring occurs at times t - l+m(4i-2) for 
i » 1 ,2, . . ,n. 

The Us(t,k,T) - la,x) for t - l+m(4i-2), i - 1,2 n, since at each 

of these times, t, the last 2m references Here to a or x. Since 

m > k, T, only a and x can be in Us(t,k, T); and 

since k, T > 2, both a and x must be in Us(t,k, T). 

Thus, for each of the n occurrences of the substring fby) m in the 

sector trace, two sector fetches are required to bring b and y into 

Us(t,k| T), and moreover only two are required since k, T > 2. 

Therefore, FFsCIUs* | = ws(t,k, T) .ST.UsCt.k, T) ) - 4n. 

Now, 

FFs/(k*k 2 ) = 4n/(k*k 2 ) > (4k*k 2 )/k*k 2 - 4 > FFp - 2 
and this proves part A of Theorem 7. 



Uhat causes this strange behavior in the number of sector fetches? 
Is it true for only strange and rare sector traces or could it be 
expected to occur in many common sector traces? We claim that this, 
behavior could occur in many sector traces. In order to provide some 
insight into this claim, consider the sector trace ST - otj a 2 ot 3 , 
where ot 2 -( (ax) m (by) m )" and a,, a 3 represent any long 
sector reference strings. The proof of part A shows that the ratio 
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(FFs/FFp) > k 2 for the substring a 2 , where k 2 can be made 
arbitrarily large bg choosing n sufficiently large. Therefore, the 
ratio (FFs/FFpT can still be wad* ar#i tHr* 1y large for fixed <x { and 

ot 3 by simply making n sufficiently large. A generality of this brief 
argument says ttrat, when a sect©* trace has any substring consisting of 
tight embedded loops, the number of sector fetches may become much 
larger than the corresponding number of page fetches. One explanation 
of this phenomenon is as f ©Hornet tight inner loops (i.e. , (bx) m )) 
drown out the benefit gained by making the sector window size large 
(i.e., the value of Us l*;Ti becomes ***** it * "> «-, while the outer loop 
causes the sectors in the ifwef loops to be fetched over and over. In 
contrast, the paged working set having a small window size, relative to 
m, is able to contain all the sectors in the embedded loops (i.e., lax! , 

(byl ) throughout consecutive cycles of the outer loop, if at least one 
sector from each inner loop is grouped Into the same page. 

From the above discussion, we observe that the page working set can 
contain more of the most recently referenced sectors than the sector 
working set, even when the latter has an arbitrarily large window size. 
Ue can eliminate this condition by redefining the sector working set aa 
follows. Recall that the sector working set, Us(t.T), has been defined 
to contain the set of distinct sectors referenced in the last T 
references. If we modified the definition of Ms(t,T) such that it 
contains the set of T most recently referenced sectors, and if we choose 
T to be k times the page working set window size, then the page working 
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set could never contain more of the most recently referenced sectors 
than those contained by this sector working 9et. However, this new 
definition of the sector working set is equivalent to demand fetch, LRU 
replacement in a memory of fixed size equal to k times the page working 
set window size. Thus, a plausible conjecture is that the number of 
page fetches under a page working set strategy could be lower bounded by 
the number of sector fetches under demand fetch, LRU replacement in a 
memory size as described above. However, Part B of Theorem 7 states 
that this conjecture i3 not true. 

Proof of Part B. 

He have to show that there exists a set of parameters such that 

FFp(|Mp t | - wp(t,T).N, na,STa,Up(t,T)) < FF 3 (|t1a| - k»T,ST - STa,Fd t R LBU ) 

k 

Let: 

Program =» (a,b,c,d,e, f ,g,hl , a set of 

8 relocatable sectors of size N/2, 
k = 2 

N = twice the sector size. 
T = 3. 

ST - (acd bef bgh acd aef b) be the sector trace. 
1ST | = IB. 
[la » Ha, bt , Ic, dl , le, f } , Ig.hM , where page 

A = la,bl, page B = lc,d),etc. 
P = (A BB ACC ADD ABB ACC A) be the resulting page 

trace. 
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«p(0,T) - rr° 9 - e. 

|t1s I - k*T - 6. 
Simulation of paging behavior to get FFp gives: 

P - A8B ACC AOD ABB ACC A 

Fu - A88 «:0 809 888 808 8 

Uptt.T) - ABB ACC ABO ABB ACC A 

AA BAA CAA OAA BAA C 



B C B 

j contents of MpH.T) ieeediately before Gth 
reference. 



Thus, FFp-2^. , \il \ - B page fetches. 

Simulation of sector behavior gives; 

ST - acd bef bgh acd aef b 

F - acd bef Bgb acd 8ef b 

M - acd bef bgh acd aef b 

ac dbe fbg hac dae f 

a cdb efb gha cda e 

acd def bgh bed a 

ac cde fbg ghc d 

a acd efb bgh c 
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Resul ts: 

FF3 - 2'*j, |f d | =14 sector fetches. 

FFp -6 < FFs/k * 14/2 = 7, QED. 

However, if we change the LRU replacement algorithm of Part B to the 
optimum replacement algorithm, then the value of FFp under page working 
set management can be lower bounded. This lower bound is given by Part 
C of Theorem 7. 

Proof of Part C. 

Note that |f1p' | = wp(t,T) < Ms(t,T)max < T. 
a. 
FFp(|nJ,| - wp(t,T),N,na,ST a .Up(t,T)) 

>FFp'(|f1p| = w s (t,T) max ,N,IIa,STa,Fd,R LRU ), 
since Pip'- Up(t,T) £ Up* , bg definition of Up(t,T) and the 
definition of Pip'' under demand fetch, LRU replacement; that is. 
Up'' always contains the 3et consisting of the |P1p* | - ws(t,T)max 
most recently referenced pages, while • tip* contains the set consisting 
of the wp(t,T) most recently referenced pages, 
b. 
FFpMinp'l = w s (t,T) mai( ,N, na,ST a ,F d ,R LRU ) 

> FFs(|f1 9 | = k*w s (t.Tl^ST - ST.,F d ,R ) 
k 

by Theorem 1, and this proves part C of Theorem 7 



Corollary to Theorem 7, Part C. 

FF p (|n), | =w p (t,T),N, na.ST, ,U p (t,T)) 

> FF S (|M S | - f, (w s (t,T) max ,N,SS),ST - ST, ,F d ,R, ) - A, 
f, (2,N,SS)/2 
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where A and f, are defined as in Theorem 5. 
Proof. 

Ue know from the proof of Part C that 

FF p (|rip' | - w„ (t,T»,N, na,ST.,ttp(t,T)) 

> FFp 1 1 rip* | - w 8 ft.T)^ ,N, Ha,5T, .Fd^o, ), 

and applying Theorem 5 to FFp* proves the core Mary immediately. 



3.G.2 Upper Bounds for Working Set Management 

An upper bound on the number of page fetches for virtual memory 
systems using the page working set strategy is given in Theorem 8. 

Theorem 8 

Given any two-level virtual memory system V, with page size N, 
primary memory sire |Wp' [ « wpit.Ti, using page uorking set memory 
management Up(t.T), and any sector trace STa, then for any partition, 

na, 'of the relocatable sectors into logical pages, where each page 
contains k or fewer sectors, the maximum number of page fetches given by 
the page fetch function, FFp, is upper bounded by 

FFpdMp' | - u p <t,T>,N, na,STa,Up(t,T)> < FFsdtls' | - w 8 (t.T) .ST.WsCt.Tl 1 , 
where the value of the sector 4 fetch function FFs is the number of sector 
fetches which occur in a two-level virtual memory system V, with 
primary memory size Ills' | - wslt.T), the same sector trace ST - STa, 
using sector working set management Us(t,T). 
Proof: 

Let: 
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Q = <S| ,S 2 ,...,Sml = Iset of relocatable sectors 

of the programl . 
Ila - I n | , II 2 , ..., Iln) be any partition of Q 

such that | IT j | < k, 
1 < J < n. 
ST = x ,x , ...,x be any sector trace, uhere 

x' i Q, 1 < t < L. 
P = p 1 ,p 2 ,...,p L be the page trace, uhere 

p' =j if x' t Ilj. 
Mp « Up(t,T) be the set of pages in memory 

of FFp at time t. 
Ma *» Us(t,T) be the set of sectors in memory 

of FFs at time t. 
Fp - f j, , f p ,. ... f^ - demand fetch policy of FFp. 
Rp =■ r' p ,r p ,...,rp = working set replacement 

pol icy of FFp. 
Fs =» f g , f\ , . . . , f L s = demand fetch policy of FFs. 
Rs = r s ,r s ,...,r s - working set replacement policy of 

FFs. 

Suppose at time t, in the FFp model, that p' - j, the page j 

containing the set of [Ij sectors, is referenced. Then at time t, in 

the FFs model, x = a i3 the sector referenced, where a c Ilj. Ue need 

to show that I L H |f p | < Z l w |f« |. 

Case 1. Suppose p' t Mp' -1 - Up(t-1,T); then fj, - +. 

a. If x' f Ms'-' = Us(t-l.T), then f[ = B and | f * | - | f [ | . 
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b. If xV Ma'- 1 « Us<t-1,T), then f[ - fell * ttelt-l.T) and 

Ifpl < Ifil- 

Case 2. Suppose p'* l1p M - Up(t-1,TT; «wi fj «ljl. 

a. If xV Fla'- 1 - Us(t-l.T), then t[ -ta* *U»lt-l,T) and 

ifil - ifii. 

b. if x'( Ms'" 1 - Ustt-1,T), then f{ - and 

I f p I > Ms !• This condition i I lustrates the only May that page 

fetches can exceed sector fetches. However, if no show that 

p x jr Up(t-l.T) * > x'^ Ustt-1,T), then case 2t» can never occur. 

Let p x * Up(t-1,T), and assume x' € Usit-l.TJ. Since x* « Uslt-l.T), 

there exists a ti«e t* in the interval it-l-T. t-ll such that 

x' = x 1 ' . Let p*' « k be the page referenced at time t* in the 

page trace P. Ue know that x'e Rfc, since sectors are not allowed to 

cross page boundaries. Ue also know that p*V Wp<t-l f T) because the 

window wize is T for both the page working set Up and the sector working 

set Us. But this contradicts the assumption; therefore 

x' / Us(t-l.T). 

Hence, J-.I I f P I S. 2m l f ll "^ the theorea is proved. 
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CHAPTER 4 
INTERSECTOR REFERENCE MODELS 



4.1 Introduction 

In the previous chapter, we presented upper and lower bounds on the 
number of page fetches uhich could occur in a virtual memory system, for 
a given program reference behavior, over any restructuring of the 
relocatable sectors into logical pages of the program. The next phase is 
to develop and present practical techniques for restructuring a program 
to achieve good locality of reference for the program in virtual memory 
systems. The task of program reorganization for virtual memory systems 
will be separated into two logical parts. The fir3t part is to develop 
automatic techniques for identifying the dynamic interaector reference 
behavior of programs executing in virtual memory systems. The second 
part is to provide clustering procedures which utilize the interaector 
reference behavior to rearrange the relocatable sectors of a program into 
its logical pages such that good locality of reference exists in the page 
trace of the restructured program. The basic idea of the second part is 
to assign the most strongly related sectors to common pages. 

In this chapter, we address the problem of intersector reference 
models. In the next chapter, automatic clustering procedures are 
presented, and finally, in Chapter 6, the results of applying these 
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Methods to real programs are investigated and compared with the 
theoretical bounds. 



4.2 Inter sector Reference Mode I 9 

It is knoun that a program's page reference patterns have a strong 
effect on paging performance in virtual meaory systems. It is also knoun 
£H1) that the sector reference behavior of many common programs, such as 
compilers, assemblers, editors, etc., proves to be remarkably insensitive 
to the input data in rather targe domains. For example, the studies of 
Hatfield and Gerald IH11 revealed that the groups of sectors uhich Mere 
used frequently together in the assembly of one program turned out to be 
essentially the same as the groups of sectors uhich were used frequently 
together in the assembly of another program. The basic difference 
between assemblies uas that the groups of sectors uhich uere used 
together for short input programs uere simply used together more often 
for long input programs. Supported by these empirical observations of 
Hatfield and Gerald, ue decided to characterize the reference behavior of 
a program by its sector trace and to base our practical restructuring 
methods on this reference behavior. Ue uill elaborate on the soundness 
of this decision in Chapter G when ue compare the paging performance of 
real programs over program structures derived from different sector 
traces. 
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Another important reason for basing restructuring methods on a 
sector trace is that the results of the last chapter may be used to 
compare the paging behavior of a restructured program with the 
theoretical best and uorst paging behavior for that sector trace. 

Given a sector trace, our objective is to specify the strength of 
the intersector references such that a clustering procedure that groups 
the strongly connected sectors together into logical pages produces a 
program structure that tends to minimize the number of page fetches. Me 
begin by presenting Hatfield and Gerald's [HG1 intersector reference 
model for defining the strength of connection between sectors. 



4.2.1 The HG Intersector Reference Model 

The HG intersector reference model consists of a symmetric matrix, 
H, showing the strength of connection between the sectors of the program 
to be reorganized. Let: 

Q = IS i ,S 2 ,...,Sm} be the program of m relocatable sectors; 
ST = S 1 ,S Z ,...,S be a sector trace of the program. 

Then 

H = [Hi j] for i, j = 1,2 m, where Hi j - l\ m] k(i, j,t). 
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where Ml, j.t) - 1 if S* -i andS ul - j* or S 4 -j and S Ui -i; 
and Mi.j, t) - 8 otherwise. 

Thus, the value of Hi j is simply the number of tines that sector I 
referenced sector } plus the number of times that sector j referenced 
sector i in the sector trace. 

Using this inter sector reference model, Hatfield and Gerald were 
able to find improvements in the number of page fetches on the order of 
two-to-one to ten- to-one by clustering sectors with large Hi j values Into 
the same page. This is the same as clustering sectors into pages such 
that the value of Hi j is small for i and j In different pages. 

Even though these results are quite impressive, the values of Hi j In 
the HG intersector reference model do not contain any information about 
the length of the time interval between successive references of sector I 
to sector j. Hence, the strength of connection, Hij, between sector J and 
] is the same for large time intervals and short time intervals. 
However, paging may depend quite heavily on the length of these time 
intervals. For example, assume that sector i references sector j 180 
times (Hij = 188) in a sector trace of 288,808 references. Nou let's 
consider tno different plausible examples of how these references could 
occur. First, these references could occur with short time intervals 
between them such that alt 188 references occur within 588 successive 
references of the sector trace. Second, these references could occur 
with some long time intervals between them such that 18 of these 
references could be found in each 28,888 successive references of the 
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sector trace. Even though the strength of connection \s the same for 
these tuo examples, the tendency for a reference from sector i to sector 
j to cause a page fetch when they are not in the sane page can be 
considerably larger in the 9econd example. 

Furthermore, the tendency of a reference from sector i to sector j 
to cause a page fetch is related to such local information as the time 
elapsed since the last reference to sector j and the number of distinct 
sectors referenced since the last reference to sector j in the sector 
trace. If the time is short since 9ector j Mas last referenced, and 
little virtual memory space uas used during that time, it is probable 
that sector j is still in primary memory and a new reference Mill not 
cause a page fetch. If the time and space traversed between references 
to j i s large, it is likely that a page fetch will occur unless j is 
grouped into the same page as the referencing sector or some recently 
referenced sector. He will now present two inter3ector reference models 
which have potential for identifying and incorporating local sector 
reference behavior into the strength of connection between sectors. 



4.2.2 Working Set Intersector Reference Models 

The sector working set, Us(t,T), will be used to define the strength 
of connection between sectors for a given sector trace. 
Let: 
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- IS, , 5 2 , ...-Sin) toe a program of w-r el oca table sectors. 
ST - S 1 ,S , ..,5 be a sector trace of the program 

where S* t Q. 
P » P 1 ,P 2 ,. ,.P l be the resulting page trace of the 

program where P 4 is the page referenced at time t. 

If S* - Sj is the sector referenced at time t, then we define 
P x - Psj to denote the page referenced at time t. Psj is to be 
interpreted as the page containing sector j. Ue have adopted this 
notation to make the folfoMing discussion easier to understand. 

Recall that the sector working set, ttstfyTI, Is defined to be the 
set of distinct sectors referenced in the time interval t-T to t of the 
sector trace. Similar ly, the page working set. Up ft, T) , is the set of 
distinct pages referenced in the time interval t-T to t of the page 
trace. 

FACT 1. 

Let S' = Sj «, and let Sj /Us(t-1,T). Then P' - Psj t Up(t-l.T) 
iff Sj * Psi for some Si ♦ Us(T-l,T). 

The proof of Fact 1 follows immediately from the definition of 
Up(t-1,,T), which is the set of distinct pages in the sequence 
Ps , - , - T ,Ps ,T .....Ps' 1 , and the definition of 
Us(t-l,Ti, which is the set of distinct sectors in the sequence 
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s , - , - T ,s , - T ,...,s t -' . 

Fact 1 states that, when sector j is referenced at time t and sector 
j is not in the sector working set, then the page referenced at time t 
Hill be in the page working set if sector j is. grouped into a page uith 
any one of the sectors in the sector uorking set. Furthermore, it states 
that the page referenced at time t will not be in the page uorking set If 
sector j is not grouped into a page with one of the sectors in the sector 
working set. 

FACT 2. 

Let S* = Sj and let Sj t Us(t-1,T). Then P* = Psj t Up(t-l.T). 
Fact 2 also follows immediately from the definition of Us(t,T) and 
Up(t.T). 

Fact 2 states that, when sector j is referenced at time t and sector 
j is in the sector working set, then the page referenced at time t will 
be in the page working 3et. 

FACT 3. 

We want the entry Uij + Uji in the intersector reference model to be 
the number of pagy fetches which will go away if sector i and sector j 
are grouped into the same page. 



129 



Using the above three facta as a basis, we present the procedure 
for constructing the interaector reference model, U = tUij], for i,j - 
1,2 m. At each instant of time, t, for 1 < t < L, do the following. 

Step 1. If S 1 = Sj and Sj *- Us(t-l.T) , then increment Uij by 1 for all 
Si e Us(t-l.T). 

Step 2. If S' = Sj and Sj *" Us(t-1,T) , then increment Ujj by 1. 

Step 3. If S* * Sj and Sj t Us(t-l.T), then no increment 13 required. 

Simply stated, the above procedure works as follows. If sector j is 
not in the sector working set when it is referenced, then increment its 
connectivity strength with all the sectors in the sector working set. 
Moreover, if sector j is in the sector working set when it is referenced, 
then do not change the strength of connection between sector j and the 
other sectors. 

Ue observe that the value of the intersector strength becomes 

Uij = l\. x k(i, j,t), 

where k(i,j,t) = 1 if S' = Sj K Us(t-1,T) and Si t Us(t-l.T), 

1 if S* = Sj r Us(t-L.T) and i - j, 
otherwise. 

Note that Uij + Uji is the number of page fetches which will go away 
if sectors i and j are grouped together in the same page. The sum of the 
diagonal elements of the intersector reference model, E'j.iUjj, is 
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the number of 9ector fetches which occurred for the sector working set. 
This will also be the number of page fetches for the page working set if 
no sectors are combined together in pages. The number of page fetches 
after combining only sectors i and j mil be I"J.| Ujj - Uij -Uji. 

FACT 4. 

If exactly two sectors are grouped into each of the n logical pages, 
then the number of times a page is referenced and not found in the page 
working set is given by 

I"?., Ujj - I Uij + Uji for 1 < k .< n. 

i.jePk 

Fact 4 follows directly from the construction of Uij, since Uij + 
Uji is the number of page fetches which are eliminated by grouping i and 
j together in the same page, and since grouping i and j together does not 
affect the value of U k | + U* for grouping any other two sectors 
k and I together in a different page. 

Unfortunately, we cannot extend Fact 4 to handle the case when more 
than two sectors are allowed to be grouped into a page. This occurs 
because the matrix, U, does not contain enough information to determine 
the number of page fetches which will be eliminated by grouping three or 
more sectors into a page. For example, Ujj is the number of fetches of 
sector j. Uij and Ukj are the number of times that sector i and sector 
k, respect ively, were in the working set when a fetch of j was made. The 
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problem is that sectors i and k both may have been in the sector uorking 

set at the time that a reference to j caused a fetch. Let the number of 

sector fetches of sector j, which Mill be resolved by grouping sectors i, 

k, and j together into a page, be denoted by Rikj. 

Then, 

HAXtUij, Ukj] < Rikj < Ui j + Ukj. 

Ue should point out at this time thai the above relations can be 
utilized in a clustering procedure. Suppose sectors i, j, and k are 
grouped together into a page. Then the unresolved sector fetches of i, 
j, and k, denoted bg U'ijk, is the number of page fetches of this page 
which Hill occur if no other sector is grouped with i, j, and k. 
But 

U'ijk < Uii + Ujj + Ukk - IIINIRikj] - HIM tRijkl - fllNlRjkil. 
Note, also, from Fact 4, that 
U" i j - Uii + Ujj - Ui j - Uji, for the case of two sectors in a page. 

Therefore, a clustering procedure could dynamically determine a lower 
bound on the number of page fetches which could be resolved by adding 
another sector to a page. 

Since the value of Ui j depends on the window size T of the sector 
working set Us(t.T), we need to elaborate on how one selects a "good 
value" for T. 
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For real programs, we measured the improvement in paging performance 
for restructured programs as a function of T. That is, we computed the 
intersector reference model U for various values of T, and for each 14 we 
restructured the program and computed its paging performance. The 
detailed results of these experiments are presented in Chapter 6. 
However, the significant characteristics of these results are as follows. 
For a given program, the best improvements in paging performance, 39 a 
function of T, occur for a rather large bandwidth of T values. For 
example, values of 1000 < T < 5000 produced essentially the same and the 
best improvement in paging performance of certain programs. For al I 
programs tested, the bandwidth of T values that resulted in the best 
improvement in paging performance was several thousand instructions; 
however, the location of this bandwidth of T values in the set of all T 
values varied from program to program. A serendipitous observation of 
the correlation between the bandwidth of good T values and the "knee" of 
the parachor curve of the sector fetch function, FF9( |Hs| ,ST,Fd,Ro) , 
produced an interesting empirical result. 

The parachor curve is a graph of FFs(|Ms|, ST, Fd.Ro) versus the 
amount of primary memory |f1s| available for execution. A typical 
parachor curve for FFs is shown in Figure 5. The value of FF3 is a 
monotonical ly decreasing function of |f1s|. For mo9t observed programs, 
there is a threshold region at which, 

a) if the amount of primary memory is decreased further, the number of 
sector fetches increases very rapidly, and. 
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b) if the amount of primary memory is increased further, the number of 
sector fetches decreases very slowly. 

This threshold region is depicted in Figure 5 and is called the knee 
of the parachor curve. The values of |Ws| in the knee of the parachor 
indicate how many sectors are required to be in the primary memory to 
maintain a "reasonable" level of performance. 

Let the average sector working set size be denoted by u s (T) and be 
defined as. 
M,-m-(l/L)-^., ws(t.T) 

Now ue present a method which identifies values of the window size T 
for use in the construction of the Hitereecter reference model 14. 

Experimental Result: 

For all the programs we tested, the bandwidth of T values which 
resulted in the best improvement in paging performance corresponds to 
those values of T for which the average sector working set size w 8 (T) 
was equal to 

a) some value of ff1s| in the knee of the parachor curve of 
FFs(|ris|,ST,Fd,Ro), or to 

b) some value of ffls| si ightiy smaller than those values of |Ms| found in 
the knee of the parachor curve. 

This experimental result was particularly handy in our research, 
since we had already computed the parachor curve of FF9(|fl3|,ST,Fd t Ro) 
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FIGURE 5. 
Parachor Curve of FFa ( |f1s| ,ST,Fd,Ro) 
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for use in establishing the lower bounds. 

If the window size, T, is very smalt, for example T-l, then the 
value of 14 i j is much larger than the number of page fetches resolved by 
grouping i and j together for most memory sizes. On the other hand, if 
the vatue of T is very large, for example 25,880, then the value of Ui j 
is much smaller than the number of page fetches resolved by grouping i 
and j together for most memory sizes. However, if T is such that the 
average working set size is in the knee of the parachor curve, then the 
value of wi j represents the intersector activity when the program has 
just enough space to execute efficiently. This corresponds to the 
intersector activity that we want to represent In the intersector 
reference model, 14. 

In addition to the above intersector reference model based on the 
sector working set, we decided to investigate the potential of the 
following model. Let the intersector reference, M* , be a m x m matrix 
defined as follows: 

U*ij - l\ mi Mi, j,t) for i,j - 1,2 m, 

where k(i,j,t) - 1 if S = Sj * Us(t-l.T) and Si « Us(t~l,T); 
8 otherwise. 

The value of w" i j is the number of times that sector j was 
referenced when sector j and sector i were both in the sector working 
set. Therefore, if the value of 14' i j is large, then Si and Sj were in 
the sector working set together many times. Note that 14* jj is the number 
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of references to sector j uhich nil I not cause a page fetch. In 
contrast, Ujj of the previous model is the number of references to sector 
j which uill cause a page fetch unless Sj is grouped with some Si. 
However, 14' i j does measure the tendency for sectors, i and j to be found 
in the sector working set together. Clustering procedures which group 
sectors into pages with large U* I j values will tend to reduce the size of 
the page working set and hence increase the locality of the restructured 
program. 

Ue conclude uith a few comments about the intersector reference 
models based on the sector working set, Us(t,T). The HG intersector 
reference model, H, is a special case of the intersector reference model, 
U. They are the same when U is computed from a sector uorking set uith 
window size, T, equal to one. The notion of using sector working sets to 
define the strength of connection betueen blocks has been investigated 
concurrently but independently of this work by Hasuda 016] and Ferrari 
[FU. Masuda's use of block uorking sets is quite different from this 
work, while Ferrari's is similar in some aspects. 



4.2.3 LRU Stack Intersector Reference Model 

The "LRU sector stack" will be used to define the strength of 
connection betueen sectors for a given sector trace. 
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Consider demand fetch, LRU replacement on a sector trace, 

ST =• S 1 ,S 2 S',...,S L , over a set of m~retecatable sectors. 

Fro* Chapter 2, we know that LRU satisfies the l**cluaion property, i..e., 

lis' (1) £ ris' (2) £ . . . £ tta* ta* I - fla* t«' +1) - rW fV +2> 

where Ws* <j) is the contents of the sector memory Ws at time t when the 
size of fts is j sector frames ti.e. , fits' | » j), and m' is the number 
of distinct sectors referenced m the seoeence S 1 ,S 2 ,...,S' . 

Because of the inclusion property,, the primary memory contents fla 

at any time t and for alt capacities can be represented in the following 

terse and useful way. we order the distinct set of sectors in the 

sequence S 1 ,S Z ,...,S' into a list called the LRU sector stack which 

is defined as SS 1 - S5* 11), SS* (21 , . . . ,SS' (m* ) where 

SS 1 (i) - fls' (i)-Hs' ti-U. Mote that 

Its'(i) - fSS* m.SS* <2f, ...SS 1 (HI for i < m' ; 
ISS 1 (l).SS' (2),...SS' (*')> for i > ■' . 



The LRU sector stack has no entries at time t - 8. The top of the 
stack is defined as SS 1 (1), while the bottom of the stack is defined as 
SS* (m* ) . 

The LRU sector stack, just after sector reference S* at time t, is 
simply the list of the set of *' sectors of the program ordered 
according to recency of usage; i.e., SS* fk> Is the kth most recently 
used sector relative to S . 
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The position of sector j in the stack just before sector reference 

S , at time t, is defined as the sector stack distance and is denoted 

by A j. Furthermore, A j => ■ if Sj has not been referenced. Thus, 

A* j -jk if SS* (k) - Sj. 1 < k < m' 
(. » otherui se 

From the definition of stack distances, we observe that S = Sj 
Mi I I cause a sector fetch under demand fetch, LRU replacement unless 
A j < | Us | uhere |Ms| is the number of sector frames in the sectored 
primary memory. 

Now, two facts are presented which relate the sector stack distances 
at time t uith the parameters of a paged virtual memory system using 
demand fetch and LRU replacement on the page trace 
P = p , p , . . . ,p , . . . , p in a primary memory of |Hp| page frames. 
The page, p , referenced at time t must contain the sector S , 
referenced at time t. 

FACT 1. 

Let S* = Sj, and let A* j > |Hp|. Then p' € Up* if Sj is 
grouped into the same page with some Si where A* i < |Mp|. 

Proof. 

Note that A* j > |f1p| states that the sector stack distance at tiwe 
t to sector j is greater than the number of page frames in Mp. 
Suppose Sj is grouped with some Si, where A* i < |Hp|. Then the sector. 
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Si, is among the |Wp| most recewthj referenced sectors* Therefore, the 
page containing Si must tee among the Hlpj most recently referenced pages, 
since Me are assuming that the sector* are smaller than pages. 

FAGT 2. 

Let S' - Sj, and let A* j £ |Kp|. Then p' « tip'. 
Fact 2 fo Mows fro* the argument applied to Si in Fact 1. Ue can use 
Facts 1 and 2 as a oasis for defining the strength of connection between 
sectors. Fact 2 states that, if S* - Sj and A* j < |Mp|, then Sj will 
not cause a page fetch; hence, for such references, the strength of 
connection between Sj and the other sectors need not be incremented. 
However, if A* j > |ffp|, then Sj will not cause a page fetch when it is 
grouped uith any sector Si with A* i < |f1p|. For the latter case, the 
strength of connection between Sj and at I Si with A* i < |ttpf wilt be 
incremented by 1. 

Now, ue define the inter sec tor reference model based on the LRU 
sector stack distance as a m x m matrix, U, where 



Ui j - l\ mi V(i, j.t) and 



1 if S* - Sj and A* j > D and A* i s D; 
V(i,j,t) - 1 if S'= Sj and A* j > D and i - j; 
8 otheruise. 
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If the value of D is one, then the intersector reference model, U, 
is the same as the intersector reference model of Hatfield and Gerald. 
Houever, we got the best results (fewest page fetches after 
restructuring) with values of D equal to the number of sectors, |Hs|, 
corresponding to the high side of the knee of the parachor curve 
FFs( |f1s| ,ST,Fd,Ro) . Figure 6 shows the typical shape of FF3 as a 
function of | lis | and the range of the values of D which gave excellent 
results for all real programs we investigated. 

One explanation which provides some insight into why the values of D 
corresponding to the knee region of FFs produce reasonable values for the 
strengths of connection between sectors is as follows. 

If D is very small, say 1, then the strength of connection between 
two sectors, Uij, is proportional to the number of page fetches only when 
the paged primary memory has one page frame. However, most large 
programs will not execute efficiently when allocated one page frame. If 
the value of |Hp| for efficient execution is much larger than 0=1, then 
the strength of connection Uij for some i and j may not even be loosely 
proportional to the number of page fetches resolved when they are grouped 
together. For very small values of D, Uij may be excessively larger than 
the number of page fetches which are resolved by grouping i and j 
together; for very large values of D, Uij may be excessively smaller 
than the number of page fetches resolved when i and j are placed 
together. Values of D in the region of the knee of the curve represent 
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the intersector activity when the program has ju3t enough space to 
execute efficiently. This is the intersector activity that we want the 
interaector reference model to measure. 
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CHAPTER 5 



CLUSTERING PROCEDURES 



5.1 Introduction 

The purpose of this chapter is to present the automatic clustering 
methods which were used in conjunction with the intersector reference 
models to restructure programs. The experimental results which 3hon the 
effect of these clustering techniques on the paging performance of 
restructured programs are presented in Chapter 6. 



5.2 Clustering Procedures 

The clustering methods presented in thi9 chapter may be applied to 
any of the intersector reference matrix models of Chapter 4. Hence, ue 
will denote any of these intersector .reference models with the generic 
C » ICij]. In those cases where a particular intersector reference 
model is needed, the notation of Chapter 4 wi I I be used. 

Ue know of no efficient procedure to produce and prove the optimal 
partition of sectors into pages to maximize the 9um of the intersector 
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connections Cij within all pages. Several clustering procedures based 
on heuristic approaches are presented in this chapter which have the 
follouing significant properties. First, they are completely automatic! 
that is, these procedures are not based on manual or "eyeball" 
reorder ings. Second, all these procedures produced restructured 
programs which showed substantial improvements in their paging 
performance. Third, these clustering procedures are qujte fast. 

The technique of the follouing clustering procedures is to take an 
intersector reference model of intersector bond strengths and cluster 
relocatable sectors into pages such that the sum of the sector bonds 
within pages tends to be maximized. 



5.3 Nearest Neighbor Methods 

In this section, we present several hierarchical methods which 
cluster the nearest two clusters under a specified bond strength 
definition one after another. 

Given any two clusters of relocatable sectors, Gx and Gy, the 
intercluster bond is denoted by Btx.y). Several intercluster bond 
definitions are given below; then a clustering procedure is defined 
over the intercluster bonds. 
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In the following definitions, the intersector reference matrix, 
C =■ [Cij], is assumed to be symmetric. If the intersector reference 
matrix is not symmetric, then each occurrence of Cij should be replaced 
with (Cij + Cji)/2. The notation |Gx| denotes the size of cluster Gx In 
bytes, and N denotes the page 3ize in bytes. 

A. Constrained Nearest Neighbor Bond 

The Constrained Nearest Neighbor bond, CNN, between any two 
clusters Gx and Gy is defined as 

B(x,y) ■ (lax ICij : UGx,j*Gyl when |Gx| + |Gy| < N. 
undefined when |Gx| + |Gy| > N. 

B. Constrained Farthest Neighbor bbnd 

The Constrained Farthest Neighbor Bond, CFN, between any two 
clusters, Gx and Gy, is defined as 

B(x,y) =» min (Cij: ieGx.jcGy) when |Gx| + |GyJ < N; 
undefined when |Gx| + |Gy|>N. 

C. Constrained Average Neighbor Bond 

The Constrained Average Neighbor bond, CAN, between any two 
clusters, Gx and Gy, is defined a3 
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B(x,y) - (l/n, y ) Zj^, Z K6y Cij «hen |Gx| + |Gy| < N; 
undefined uhen |Gx| + |Gyj > N. 

Here n xy 13 the number of Cij > with i € Gx, j c Gy. Note that n Ky is 
the number of arcs between Gx and Gy, and it is not the sum of the 
values on these arcs. 

0. Constrained Average Neighbor Weighted Bend 

The Constrained Average Neighbor Weighted bond, CANU, between any 
two clusters, Gx and Gy, is defined as 

B(x,y) » n„ y * (l/n, y ) 1^ Z^ Cij when |Gx| + |Gy| < N; 
undefined uhen |Gx|+|Gy| > N, 

Hence, 

B(x,y) - Z^c Z HGy Cij when |Gx| + |Gy| < N. 

A clustering procedure is now defined for use with any one of the 
above definitions of B(x,y). 

First, choose any one of the above definitions of B(x,y). Second, 
partition the m relocatable sectors of a program into exactly m 
clusters, uhere each cluster contains one sector. Then, at each step in 
the clustering process, the nearest two clusters are combined to form a 
new cluster. The nearest two clusters are defined to be the two 
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clusters Gx and Gy which have the largest value of B(x,y). Uhen the sum 
of the size of the two clusters becomes larger than the page size in the 
clustering process, these two clusters are not considered to be 
connected; that is, their bond strength is undefined. The process 
conies to an end uhen new clusters cease to appear. 

When the above clustering procedure is applied to the Constrained 
Nearest Neighbor bond definition of B(x,y), it will be referred to as 
the CNN procedure; when applied to the CAN definition of B(x,y), it 
wi I I be referred to as the CAN procedure, etc. 

All of these clustering methods are computationally fast, easy to 
implement, and they tend to group the sectors with the strongest 
intersector strengths, Cij, into the same page. Hence, they tend to 
minimize the interaction of sectors clustered into different pages. 

The CNN, CFN, and CAN procedures are variations of clustering 
procedures which are widely used in the field of multivariate analysis. 
The Constrained Average Neighbor Weighted bond, CANU, procedure uas 
developed in this research. In fact, we experimented with several 
weighted versions of the CNN, CFN and CAN procedures. However, the CANU 
procedure consistently produced program structures which required fewer 
page fetches than the program structures produced by the CNN, CFN, and 
CAN procedures or by any of the other weighted versions we examined. 
One explanation for the success of the CANU procedure is that at each 
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step it combines the two clusters which have the Best total intersector 
connections between the*. 

In the above Constrained Neighbor bond definitions, CNN, CFN, CAN, 
and CANU, the constraint |Gx| + |Gy| < N insures that the size of a 
cluster never exceeds the page size. However, natural clusters of 
sectors may in reality be larger or smaller than a page size. It is of 
course conceivable to make clusters covering several pages without any 
consideration of page sizes and to assign each of them to several 
contiguous pages. In order to evaluate the merits of allowing cluster a 
to become any natural size, we experimented with 

a) the Unconstrained Nearest Neighbor bond, 1MN, 

b) the Unconstrained Farthest Neighbor bond, UFN, 

c) the Unconstrained Average Neighbor bond, UAN, and 

d) the Unconstrained Average Neighbor Weighted bond, UANU, 

where UNN, UFN, UAN, and UANM are defined to be exactly the same as CNN, 
CFN, CAN, and CANW, respectively, with the exception that the constraint 
|Gx| + |Gy| < N is not present in the unconstrained cases. That ia, in 
the unconstrained cases, clusters may be combined independently of their 
sizes. 

The clustering procedure for the constrained clusters had to be 
modified slightly in order to be applicable for the unconstrained 
clusters. The clustering procedure for the unconstrained clusters ia as 
fol lows. 
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Choose any one of the unconstrained definitions of B(x,y). 
Partition the m relocatable sectors of a program into exactly m 
clusters, where each cluster contains one sector. Then, at each 9tep in 
the clustering process, the nearest two clusters (i.e., the tuo with the 
largest value of B(x,y)) are combined. Nou we uill define what we mean 
by combine. 

Let the two clusters which are to be combined at any step of the 
clustering process be denoted by 

Gx =■ Sx|,Sx 2 , . . . ,Sxj and 

Gy = Sy,,Sy 2 , . . . ,Syj, 
uhere the cluster Gx is defined to be the ordered list of i sectors, and 
the cluster Gy is defined to be the ordered list of j sectors. The 
combination of the clusters Gx + Gy is defined to be the ordered I i at of 
i + j sectors 
Gx + Gy = Sx,,Sx 2 , . . . ,Sx|,Sy|,Sy 2 , . . . ,Syj. 

Since each cluster starts out with one sector, the above definition 
of combining two clusters insures that the relative order in which 
sectors are clustered is preserved. This is important in the 
unconstrained case, because the clustering procedure end9 when all the 
clusters which are connected are grouped into one giant cluster, which 
could be the whole program. 
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Note that the order of the sectors in the constrained clusters is 
not important, because a constrained cluster Mill always fit into a 
page. 



5.4 Hatfield and Gerald Method 

The Hatfield and Gerald clustering procedure can be applied to any 
intersector reference matrix model, C « ICijl. The HG clustering 
procedure is defined in detail in IH1J and is briefly summarized belou. 
Let 

E - [Eijl, i.j - l,2,...,m (m is the number of sectors), 

uhere 

E i j - -C i j when i «• j 

2^.i Ci j + 2m when i - J. 

The inverse matrix of E is calculated, then a row in the inverse is 
chosen, and a set of sectors in that row are clustered into a page, and 
the process is iterated until all sectors are assigned. 

Ue thank Don Hatfield for providing a copy of his restructuring 
program for use in our restructuring experiments. 
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5.5 Sector Interchange Procedure 

The sector interchange procedure, SIP, is developed in this 
section. The SIP begins with the set of m relocatable sectors of a 
program partitioned into n blocks. That is, assume that a partition, II, 
of the set of sectors, IS j ,S 2 , . . . ,SmI , making up a program is given. 
Let II be denoted by 

II =• II1| ,n 2 Tin) uhere |Ilj| is the number of sectors in the j-th 

block of n. 

The blocks, IIx, of II may represent the logical pages of a program, 
where the sum of the sizes of the sectors making up a block of II is less 
than the page 3ize, or the blocks of II may represent natural clusters of 
sectors, uhere the sum of the sizes of the sectors making up a block may 
be greater than a page size. 

The basic strategy of the sector interchange procedure, SIP, is to 
reassign sectors to blocks of II by exchanging two sectors of different 
blocks uhen the exchange provides a positive contribution to the sum of 
the sector connections Mi thin blocks. In order to be more precise. He 
need to define a few terms. Let 

C = ICij] be a symmetric intersector reference 

matrix for i,j < m, and 
P = IS| ,S 2 ,...,Sm) denote the set of sectors 
making up a program. 
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Definitions: 

The complement of Hx is denoted -Ilx and 

-&x - trt j c n : nj * nxi 

Let Si <■ Ilx; then the intrabloek 

bond of sector i, Si, with block Ilx la 

de f i ned as 

BU.Ilx) - .I K „ Cij 
Let Si t. Ilx and Si p Ily; then the interblock 

bond of sector i with block Ily is defined as 

B(i.ny) - Z^ y Cij 
Let Si c Ilx; then the interblock bond of sector 

i wi th a 1 1 other blocks is defined as 

BJi.-nx) « 2n, nt Cij 
The quality of the bond for the i th sector is defined as 

q„ (i)=B(i,TIx) - Bn.-JIx), where Si « Ilx. 
The quality of a sector partition II is 

de f i ned as 

Q„ - 2 Sit p q n (i) 

The goal of the sector interchange procedure, SIP, is to maximize 
the quality Q„ by interchanging sectors between blocks of the 
partition. We now present an efficient method to find an optimal 
assignment of sectors to blocks under the constraint that each 
interchange consists of exchanging a sector of one block with a sector 
of another block. 
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Lemma G 

Let Si € Ilx and Sj t Ily. If Si and Sj are interchanged, the net 
gain in the quality Q„ , denoted by A Q„ (i,j), is given by 

A Q„ (i.j) = 4IB<j,Ilx) - B(j,ny) + Bli.IIy) - B(i.IIx) - 2Ci jl. 

Proof: 

Let Si i Ilx, Sj t Ily and Ilx, Ily € II. Nom, interchange 9ector9 Si 
and Sj which produces the neu partition IT. 

a a„ (i.j) - q„ - a; - z SkeP q „ <m - i** q ; (k) - z SkfP q „ (k> - q ; cm, 

Let A q(k) = q„ (k) - q n ' (k). 
Now we consider 5 cases. 

Case 1. A q(k) = 2(Ckj - Cki) for all kt Ilx, k*i. 

Case 2. A q(k) = 2(Cki - Ckj) for all ks Ily, k*j. 

Case 3. A q(k) = 8 for all k< -(Ily + Ilx) 

Case 4. A q(i) = B(j,nx) - B(j,ny) - B(j,Ilx + Ily) - 2Cij 
- B(i,nx) + B(i,IIy) + Bli.IIx + ny) 

Case 5. A q(j) = B(j,Ilx) - B(j,Ily) + B(j,IIx + Ily) - 2Ci j 
- B(i,nx) + B(i.IIy) - B(i,n« + Ily). 
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Now, 



A Q n (i,j) = I q(k) + X A q(k) + A q(i) + A q(j) 
ktnx k*ny 

k* i kx j 

= 2[B(j,nx) - B(i.rix) - Ci j] 

+ 2[B(i,riy) - B(j,ny) - Cij] + A q(i) + A q(j) 

A a„ (i,j) = 4tB(j.nx) - B(j,ITy) + B(i,n y ) - BU.rix) - 2Cijl 

QED. 



Now we present a Lemma which permits us to quickly select the Sj and Si 
for exchange. 
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Lemma 7: 

If A Q„ (i,j) is positive, then q„ (i)+q n (j) is negative. 

Proof: 

q n (i) - B(i f nx) - B(i,-nx) 

= B(i,nx) - B(i.ny) - B(i,-(nx+ny)) 
similarly, 

q„ (j) - B(j,ny) - B(j,rix) - B(j,-(nx+ny)) 

From Lemma G, 

A a„ (i.j) - MBCj.IIx) - B(j,IIy) + B(i f ny) - B(i.nx) -2Cijl. Hence, 

A Q„ (i,j) = -4[q„ (i) + q n (j) + B(i,Mnx + ny) ) + B(j,-(n x + ny) ) + 2Clj] 

But B(i,-(nx + ny)) + B(j,-,(IIx + ny) ) + 2Cij > 8, and 

A Q„ H,j) > 0. Thus q„ (i) + q n (j) < 8. 
QED. 

FACT 1: 

The maximum value of A Q n ( i , j) = -4(q„ (i) + q„ (j)). 
Thi3 fact follows directly from the proof of Lemma 7. 

FACT 2: 

If A Q„ (i,j) > 8, then (i,j) must be an element of the 
Interchange get, I„ , uhere 
! n - Hi, j) : q„ (i)+q n (j) < 8, i, je PI. 
This fact follows immediately from Lemma 7. 
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Now we iteratively define the sector interchange procedure, SIP. 
Ue assume that an initial partition, 11°, and an intersector reference 
matrix, C, are given. 

The operations performed in the kth pass are these: 

a. Compute the set I n K-l 

b. Select a pair (i,j) such that 
AQjjK-il'.j) > AQ n K.|<u.v> for 
all (u,v) e In*" 1 

c. If A Q n K-] (i»j) > 0? then interchange sectors i and j 
of n k '' to get n k , and go to the (k + l)th pass. 

If A Q n K-i (i, j) < 8, then stop with n M . 

The SIP has to terminate at some pass k, since Cij is finite. If it 
terminates on the kth step, then n k "' is optimum in the sense that 
no pair wise interchange can increase the value of Qr.K-1 . This is 
obvious, since I-k-i contains al I the possible candidates (i,j) that 
could possibly make A QjiK-i positive, and since at termination 
A QjjK-i (u,v) < 8 for all (u,v)c It»k-i . 

In each pass of the previous algorithm, by keeping the list of 
sectors in the set IjjK-i sorted and using Fact 1, the algorithm can be 
made much more efficient. 
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The sector interchange procedure, SIP, is particularly useful when 
one has a partition, II, where the blocks of II represent natural clusters 
of sectors. Another application of SIP is in the evaluation of breaking 
up huge sectors into smaller parts by reprogramming. 

An ongoing research project between the author and Don Hatfield of 
IBM is to evaluate the potential benefit of reprogramming and then 
restructuring a very large data base system. The rationale for 
reprogramming is to divide the very large sectors (over 16 pages long) 
into relocatable subsectors and then restructure the new program. 
Theorem 1 can be used to predict the theoretical best paging performance 
if the large data base program were broken up into exactly k sectors per 
page. Then, given an intersector reference matrix and a partition, II, 
of k sectors per block, the sector interchange procedure, SIP, can be 
used to restructure the program. 



5.6 Intercluster Bonding Method 

The purpose of the intercluster bonding method is to identify 
natural clusters of dense sector interactions. This task is 
accomplished by permuting the rows and columns of an intersector 
reference matrix model in such a way as to group the numerically larger 
matrix elements together. 
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The definition of the intercluster bond measure is given first, 
then we illustrate the capability of this measure to cluster the larger 
matrix elements together, and then ue present a fast approximate method 
of permuting the rows and columns of a given matrix such that the 
intercluster bond measure tends to be maximized. 

Given a symmetric intersector reference matrix C ■ [Cij] for 
i , j = l,2,...,m which represents the intersector activity between the M 
relocatable sectors of a program, we define the intercluster bond 
measure, ICB, as 

ICB(C) > St, 2 m H CijCCmj+Cmj+Cu., +C i>H ) 
where C 0j = C m \j - C i0 » C imt | - 8 by definition and Cij > 0. 
Ue point out that the bond strength between two nearest-neighbor 
elements of C is their product. 

The intercluster bond measure, ICB, is defined so that a matrix C 
that has dense clusters of numerically large elements will have a large 
ICB when compared with the same matrix whose columns and rows are 
permuted such that numerically large elements are more uniformly 
distributed over the array cells. In order to illustrate the 
sensitivity of ICB(C) to the degree of dumpiness of the large values of 
Cij, we present the following two simple examples. Example 1 shows the 
same matrix with 5 different row and column permutations. Matrix C 5 
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which has the largest intercluster bond measure contains two 
noninteract ing clusters. One cluster consists of the sectors a and c, 
while the other cluster consists of the sectors b and d. The fact that 
matrix Ci could be reordered to produce two noninteract ing clusters i-e 
not readily apparent even for this simple example. Example 2 shows a 
slightly more complicated matrix. Matrix C4 of example 2 is 
characterized by a block checkerboard form, where the blocks of sectors 
along the main diagonal represent the primary sector clusters and the 
off-diagonal blocks indicate the intercluster interactions. Matrix C 5 
which has the largest intersector bond measure of Example 2 has the sane 
set of primary clusters a3 Matrix C A but it differs from C A in that 
the clusters which interact the most are ordered adjacent to each other. 
The intercluster bond measure, ICB, tends to be maximum when the most 
strongly intraconnected sectors are clustered together and the most 
strongly interconnected clusters are clustered together. He call ICB 
the intercluster bond measure because it tends to cluster the 
intercluster connections as well as cluster sectors. 

In our experimental studies, sector orderings which produced the 
largest values for the intercluster bond measure provided as good as or 
better improvements in the paging performance than any other program 
restructuring method tested. 
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Example 1: 
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ICB(C 2 ) = 258 
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1CB(C 5 ) = 1312 
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Example 2: 
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ICB(C 2 ) - 15G0 
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ICB(C„) = 277B 
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c 5 

ICB(C 5 ) .- 3536 

Note that the definition of ICB may be decomposed into the two 
parts as fol lows: 



ICB(C) - ICB(C R )+ICB(Cc), where 

ICB(C R ) - ST., 2™., Cij(C, liJ+ C i4lii ) 

ICB(Cc) = 11, Z"., Cij(C u . 1 + C iiH ) 

The value of ICB(C R ) is the sum of the row bonds and the value of 

ICB(Cc) is the 3um of the column bonds. 
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Property 1: 

The values of the row bonds, I"J.j Cij(Ci_| ( + Cj,jj) 
are not affected by any permutation of the in columns of C. 

Proof: 

Let X =1 X (1), X (2),... X (m)) denote any permutation 
of the m columns of C producing the neu matrix 
D=[Dij] = [C l(X(j) 1. 
Then, for any 1 < i < n, 

2j.i Ci j(C M| j + Cj,|j )- I).] Cj^j) (Cj.| iX( j) +C M>x( j)). 
This is clearly true, since i is fixed over the summation of all j, 
Thus, for every term in the summation on the left, 
Ci j CCj_ | j + Cj.| j ) , there must be a value k, 1 < k i I, 
such that 
C'JtCj_|j + Cj,|j) = Cjx(k) lCj. l(X (|,) + C it | |Mk) ). 

Property 2: 

The values of the column bonds, Z"J.| Cij(Cjj_| + Cjj 4 j) 
are not affected by any permutation of the m tomb of C. 
Proof is the same as that of property 1. 
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Property 3: 

ICB(C R )«ICB{Ce) for symmetric matrices C. 

Proof: 

icb(c r ) - n, r;, cij(c M> , *c MJ ) 

- Z M H It, Cij (C i>H 4^,1 
= ICB(Cc). 

Property 4: 

The contribution to ICB(C) from any row is only affected by the tno 
adjacent rows. The contribution to ICB4CT from any column ia only 
affected by the tu© adjacent columns. 

Property 4 is obvious, since the contribution to ICB(C) from any 
rou i is Z*jL| Cij(Cj_y +Cj»ij ) and from any 
column j is Z™, CijtCy., +C^| ) . 

From properties 1 and ,2 the max i mi zation of ICB(C) over al I column 
and rou permutations reduces to tuo separate optimizations. One is for 
the rows, ICB(Cp), and the other for the columns, ICB(Cc). 

From properties 1, 2, and 3, ue know that the rou permutation uhich 
maximizes ICBtCp )', is the same as the column permutation that 
maximizes ICB(Cc). Thus, ail we need to do is find a rou permutation 
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that maximizes ICB(C R ), then reorder the rous and columns of C 
according to this permutation to maximize ICB(C). 

The problem can be stated formally as follows: 

Let * =[ A (1), A (2),..., A (m)] denote a permutation 

of m columns of C producing the new matrix 

D=[Dij] - tC i>Mi) ]. 
Maximization of the summed column bonds ICB(Cc) is given by, 

Max over A of I"jL, I m M Di j [D iiM + Dy.,1, 
where A ranges over all m! possible permutations. 

This may be transformed into a quadratic assignment problem for which 
optimal and suboptimal algorithms have been published [G3] . These 
3uboptimal algorithms Mere not used, since they are too time comsuming 
for large m, i.e., they require operations which rise with the fifth 
power of the matrix size. 

Now we define a suboptimal method which exploits the 
nearest-neighbor feature tH5] of property 4. This method is much faster 
than the optimal methods and is believed to produce near optimal 
order ings. The intercluster bond method is as follows: 

A. First compute and save the set of intercolumn bonds for all pair3 
( i , j ) of columns, i.e., 

2'k. i C kj * C kj for a I I 1 < i, j < m, i * j. 

B. Pick one of the columns arbitrarily, put it into a list, and set 
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k-1. 

C. For each of the remaining m-k columns, compute the contribution to 
the intercluster bond measure for each of the k+1 possible positions to 
the left and to the right of each of the k columns already placed in the 
list. Place the column that gives the largest incremental contribution 
to the intercluster bond measure in its best location in the list. 

D. If k=m, stop; otherwise, increment k by 1 and repeat step C. 

Uhen the above procedure terminates, simply order the rows and 
columns of C in accordance uith the list of columns. 

Property 5: 

The time for the execution of the clustering process in step C 
grous as m . 
To see thi9, note that 
2^.,lk+l) (m-k) = m 3 /6 + m 2 /2 - (2m/3). 

The intercluster bond method will cluster the sectors into disjoint 
groups if this is possible. 
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CHAPTER G 



EXPERIMENTAL RESULTS 



G.l Introduction 



The purpose of th'19 chapter is to report on an experimental study 
of the paging performance of programs. The objective of this study is 
to evaluate the practical restructuring methods developed in Chapters 4 
and 5. The evaluation consists of tuo basic parts. First, the paging 
performance produced by the different restructuring methods are related 
and contrasted uith one another. Second, the improvements in paging 
performance produced by the practical restructuring methods are compared 
with the theoretical best and uorst improvements as given by the bounds 
in Chapter 3. 

Ue have performed experiments, using the IBM System/368 Model 67 at 
the Cambridge Scientific Center, on compilers, assemblers and a large 
data base program. The results of a specific example will be presented 
in detail. Ue have chosen as an example the restructuring of a highly 
modular compiler [A3] . This example is selected because we have 
experimental results for all of our restructuring methods applied to 
this compiler. The author and Don Hatfield of IBM plan to publish the 
results of using some of these methods to restructure a "large data 
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base" system as soon as our results are completed. 

This compiler has 4 phases. Phase is a very small root phase 
which simply ha3 Phase 1 read in, and, when Phase 1 is over, has Phase 2 
read in, and, when Phase 2 is over, has Phase 3 read in. Each of the 
phases is a separate overlay in the 3ense that they do not share any 
address space. Therefore, we may think of Phases 1, 2, and 3 as three 
separate programs. There are between 70 and 100 relocatable sectors per 
phase. For each compilation, we computed three distinct sector traces. 
One trace was for Phase 1, one for Phase 2, and one for Phase 3. In 
particular, from the time that a phase was loaded into the address space 
until its subsequent removal, a full instruction trace of all references 
to the relocatable modules of that phase was recorded. This instruction 
trace and the load address of all the relocatable sectors (modules) are 
sufficient to compute the sector trace. 

In order to compare the effectiveness of the different arrangements 
of sectors into the virtual address space, LRU and OPT paging simulators 
were developed for a single user paging against himself. Input to the 
simulator uas a sequence of page requests generated from the full 
instruction trace and a neu ordering of sectors into the address space. 
A modified version of the one pass OPT algorithm by Palermo and Be lady 
IBGl uas used as the OPT paging simulator. 
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Uhen sectors have been assigned to pages, one problem remains. 
Uhat to do about page boundaries? Holes in pages can occur if sectors 
do not fit evenly into pages. For most real programs, we have two 
alternatives. First, we do not allow sectors to cross page boundaries* 
which may cause empty space within the pages. Second, we pack sectors 
one after another into the virtual address space, leaving no holes but 
allowing the sectors to cross page boundaries. Hatfield IHU has 
reported on the relative success of the latter approach. 

For our experiments, we packed sectors one after another into the 
virtual address space, leaving no holes between the sectors. That is, 
given a partition IT of the sectors in blocks, we placed the blocks of 
the partitions into the virtual address space one after another. The 
unconstrained average neighbor weighted bond, UANU, procedure was used 
to automatically order the clusters for insertion into the address 
space, unless the clustering procedure produced ordered clusters. 

The next feu sections report on the results of the restructuring 
experiments performed on the different phases of the compiler. The 
basic structure of these experiments on each phase is as follows. 

A. A full instruction trace is recorded and mapped into a sector 
trace. 

B. An intersector reference matrix model is constructed from the 
sector trace. ' 
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C. A clustering procedure, based on a particular intersector 
reference matrix, is used to partition the relocatable sectors into 
blocks. 

D. The resulting ordered blocks of the partition are inserted into 
the address space one after another. 

E. The paging performance of the restructured program is simulated 
using LRU replacement (sometimes OPT replacement is used). Ue 
chose LRU replacement because so many contemporary virtual memory 
systems use some form of this algorithm. 

F. The theoretical upper and lower bounds on the paging 
performance are computed by applying the methods of Chapter 3 to 
the sector trace of step A and compared with the performance found 
in step E. 

In order to identify the parameters of the page fetch function, 
FFp( | Mp ( ,N,ria,STa,Fd, R LRU ), which are associated with each curve 
in the following graphs, these conventions are presented. 

1. |Hp|, the size of the primary memory, in pages, is used as the 
horizontal axis of the graphs. In addition to the values of |f1p|, 
the horizontal axis is tagged with the memory size in K bytes 
(K-1024). 

2. N, the page size in these experiments, is 4B96 bytes. 

3. A partition II of relocatable sectors into clusters is denoted 
by IIx or Ely for ease in interpreting the results in the following 
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figures. IIx is used to denote a "bad" partition, i.e., one which 
tends to maximize or produce a relatively large value of FFp. Ily 
denotes a "good" partition, i.e., one which tends to minimize the 
value of FFp. 

A particular value of Ily is denoted by specifying the intersector 
reference matrix and the clustering procedure which produced it. 
For example, 

riy(U,T=2580,CNN) 
is defined to denote the value of Ily which is computed from the 
working set matrix, 14, with a window size of T-2580, using the 
constrained nearest neighbor procedure, CNN. 
The intersector reference matrix models used to specify a 
particular Ily will be identified in terms of the following symbols: 

U = outside working set matrix model 
U'= inside working set matrix model 
T = window size of working set model 
U = LRU sector stack matrix model 
D = sector stack distance 
H = Hatfield and Gerald matrix model 

The clustering procedures used to specify a particular value of Ily 
will be one of the following: 
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CNN = constrained nearest neighbor 

CFN = constrained farthest neighbor 

CAN = constrained average neighbor 

CANU ■= constrained average neighbor weighted 

UNN = unconstrained nearest neighbor 

UFN = unconstrained farthest neighbor 

UAN = unconstrained average neighbor 

UANU = unconstrained average neighbor weighted 

HG = Hatfield and Gerald method 

SIP = sector interchange procedure 

ICB = intercluster bond method 
As another example, 

IIy(U,D=28,ICB) 
represents the partition named Ily when it i3 computed from U, with D-28, 
using the ICB procedure. 

In the presentation of these experimental results, Ue chose to 
denote the program structure in terms of II instead of the sector 
ordering SO, because the clustering procedure is clearer when stated in 
terms of II. However, the reader should be aware that the blocks of the 
partition are allowed to cross page boundaries in order to eliminate 
hole3 in the address space. 

4. A particular value of SOTa will be denoted by S0T ( , SOT 2 t and 
S0T 3 for the three phases 1, 2, and 3 respectively. Furthermore, 
SOTia, SOTib, etc., will represent the sector trace of the i th phase 
from input program a, b, etc, when the distinction is important. For 
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example, SOT 2 a denotes the sector trace of Phase 2 from input program 
a. Note that all of the sector traces in the simulations are ordered 
pairs (S,0) where S is the sector and is the offset referenced. This 
is necessary because we are allowing sectors to cross page boundaries. 
5. The fetch and replacement algorithms are denoted as before, i.e., 
Fd, R LRU , Ro, etc. 

In order to find a TJx that tends to maximize the value of FFp, ue 
investigated random sector orderings, sector orderings based on sector 
sizes, lexical orderings (i.e., alphabetical on some character in the 
sector name), and sector orderings produced by the following procedure, 
called BAD. Take the list L of m sectors, ordered according to their 
position in the address space under a good program structure, and do the 
following to produce a partition Ilx of the m relocatable sectors into n 
logical pages. 

1. Take the first n sectors of L and put each of them into one of 
n separate I ists. 

2. Take the next n sectors of L and put each of them into one 
of the above n separate lists. 

3. Repeat 2 until there are no more sectors in L. Then, 

4. the collection of the n separate lists becomes Ilx. 

It turned out that all of the above methods of generating Ilx usually 
produced a Ilx that caused the value of FFp to be very large. 
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6.2 Restructuring Phase 1 

Throughout this section He use the same sector trace, SOTj . In 
section 6.5 we compare the results of program restructuring over several 
sector traces. Our results support the claim of Hatfield and Gerald, 
"many commonly used programs are rather insensitive to input data." 

Houever, we did attempt to choose a program for tracing that 
contained most of the features of the language and that was relatively 

long. That is, this program Mas not trivial. The sector trace of this 
program contained 7,521,285 references. Moreover, |SOT, 1=2,891,827, 

|S0T 3 |=3,859,G36 and |S0T 2 | -1,668,542. 

The value of IIx is fixed for Figures 7-14 and represents the 
program structure B, which occurs when the sectors are arranged in the 
address space according to their size. Even though the structure 
produced by the BAD procedure resulted in slightly more page fetches for 
most memory sizes, we selected Ilx based on the sector lengths (calld 
B ( ) because this represents a plausible method of loading sectors U9ed 
by some operating systems. The choice of Ilx is used as a bas'19 for 
illustrating the actual improvement in the paging performance which can 
occur for real programs which are restructured according to 9ome fly. 
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B.2.1 Constrained Procedures 

The curves of Figures 7 and 8 and the lower, curves, labeled C, D, 
and E, of Figure 9 show the ratio of the page fetch functions 
FFp(|Mp|,N,nx,SOT, ,Fd,R LRU ) and 

FFp( |f1p| ,N,riy,S0T| ,Fd,R LRlJ ) as a function of primary memory size 
|Mp| in pages and as a function of IIx and Ely where Ily is constrained, 
riy is constrained when the blocks of Ily correspond to the clusters 
produced by any clustering procedure and the size of these clusters is 
constrained to be less than or equal to the page 9ize. 

These figures reveal that the orderings of the relocatable sectors 
into primary memory can have substantial influence on the paging 
performance of virtual memory systems. Moreover, they illustrate that 
substantial improvements in paging performance occur over a relatively 
uide range of primary memory sizes. 
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The degree of improvement in paging performance shown in these figures 
(i.e., 7-8) is significantly larger than any previously published 
improvements obtained by restructuring. One rationale for this is that 
the intersector reference matrix models based on the working set and the 
LRU stack distances capture the intersector activity upon which paging 
depends. That is, the value, Cij, of the entry in the intersector 
reference matrices used in these experiments may have a strong tendency 
to be proportional to the number of page fetches which will go away if 
sector j is grouped with sector i. In particular, note the improvement 
in paging performance depicted by curves E, D, and C of Figure 9, which 
use the HG clustering technique on the sector working set intersector 
reference matrix. This improvement is about twice as much a3 that 
reported by Hatfield and Gerald [Hll when the same clustering procedure 
is applied to the HG intersector reference model. Recall that the HG 
intersector reference model is the same as the sector working set model 
when T=l. 
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A=* TTy (W,T = I000, ICB) 
B => TTy (W,T=2500, ICB) 
C =» TTy (W,T=I500 , H6 ) 
D => TTy ( W,T=2500,HG) 
E => TTy (W,T=5000,HG) 
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|S0T, I =2,001,027 




FIGURE 9 



FFp (iMpl.N.TTx.'SOT, , Fd, R LRU )/ FFp ( lMpl, N, TTy, 
S0T,,Fd,R LRu ) vs|Mp|FOR PHASE I OF AED COMPILER 
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6.2.2 Unconstrained Procedures 

The unconstrained clustering procedures presented in Chapter 5 
cluster the relocatable sectors into natural clusters without any 
constraint on the sum of the sector sizes Making up a cluster. To date, 
no work has been reported in the literature which incorporates this 
rather simple idea into clustering procedures. 

The curves identified by labels A and B of Figure 9 show the 
improvement in paging performance which occurred when natural clusters 
were formed. These natural clusters were produced by the intercluater 
bond method, ICB, using the sector working set intersector reference 
model. These curves illustrate that natural clusters can provide 
significantly better improvements in the paging performance than the 
improvement provided by the constrained clustering techniques. 

The curves of Figure IB (except curve D) show the improvement in 
paging performance for several unconstrained clustering techniques. The 
curve labelled D in Figure 18 shows the improvement in paging 
performance provided by the existing compiler structure. 
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A => TTy (W,T=2500, UANW) 

B =» TTy (W,T =2500, ICB) 
C => TTy (U, D =15, UANW) 
D => TTy ( Compiler) 

N =4096 Bytes 
ISOT, I =2,001, 027 
TTx = B, 




FIGURE 10 FFp (|Mp|, N.TTx.SOTpFd, R LRU )/FFp(|Mp|,N, TTy, 

SOT, Fd, R LRU ) vs |Mp| FOR PHASE I OFAED COMPILER 
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Recall that all these improvements are relative to the program structure 
Ilx formed by arranging the sectors into the address space in order of 
their sizes. Curve D shows that the existing compiler structure is 
substantially better than that provided by Ilx and significantly worse 
than any of the unconstrained techniques. 

Figure 11 shous the effects of the unconstrained average neighbor 
ueighted bond procedure UANU on the paging performance as a function of 
T for the uorking set intersector reference model U. The significant 
characteristics of the curves shoun in Figure 11 is that the 
improvements in paging performance are relatively the same over a broad 
range of T values. 

Note the tendency of the curves in Figure 11 to peek in the center 
region of the primary memory sizes. This tendency is due primarily to 
the following two "principles" pushing a curve together from bqth sides. 
The first principle is that for small values of |Mp|, one clustering 
method "cannot win" over another method. The second principle is that 
for large values of |Mp|, one clustering approach "cannot lose" over 
another approach. Houever, in the middle range of the values of |Mp|, 
there may be enough primary memory available to contain most of the 
sectors referenced close together in time Mhen they are clustered 
together into groups. Note that in this region there can be tuo levels 
of clustering for good structures. The first level is that sectors are 
clustered together by the clustering procedure. The second level is 
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A=* TTy (W,T =I000,UANW) 
B=* TTy (W,T=25000, UANW ) 
C=* TTy (W,T = 5,000, UANW) 

N = 4096 Bytes 
|S0T,| = 2,00I,027 
TTx = B 




FIGURE 



FFp (|Mp|,N,TTx,S0T| , Fd, R LRU )/FFp (|Mp| , N, TTy, 
S0T,,Fd,R LRU ) vs |Ms| FOR PHASE I OF A ED COMPILER 
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that clusters are clustered together by the paging mechanism. 



6.2.3 Theoretical Bounds 

In Figure 12 the performance for the best program structure, i.e., 
the one produced by Ily (U, T=2588,UANU) , is compared with the theoretical 
best performance given by Theorem 8. Observe that Table 3 precisely 
defines the parameters for the curves sboun in Figure 12. Curve B shows 
the ratio of the page fetches experienced by the program under the 
structure produced by Ily (U,T-2588,UANU) to the theoretical lower bound 
on the page fetches. That is, curve B depicts 
FFp(|Hp|,N,ny,SOT, ,Fd,R LRU )/the Lower Bound. This ratio can 
never be less than one and would be equal to orte when the theoretical 
best performance occurred for a given program structure. Figure 12 
shows several significant characteristics. The performance produced by 
the structure Ily (U,T =2588, UANU) is relatively close to the lower bound 
for large regions of primary memory size. Furthermore, it i3 close to 
the louer bound in the primary memory regions of low paging rates. This 
latter fact can be seen by observing the curves in Figure 13. Curve D 
of Figure 13 shous the number of page fetches for the structure 
ny(U,T=2500,UANU), and curve A shows the theoretical lower bound for the 
number of page fetches over all Ily. 
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FIGURE 12 Comparison of Actual and Theoretical Ratios of FFp 
FOR PHASE I OF AED COMPILER 
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Graph A is: 

FFpdtlpl.N.IIx.SOT, ,Fd,R tHU )/FFpC|Hp|,N,ny,SOT, ,Fd»R LWJ ) 

Graph B is: 

FFpClflDl.N.Ilu.SOT, .Fd.R^J/ FFsdtlsl - f, tlUpI.N.SS* ) .SOr, .Fd.Ro) -A 

f,42iH,SSlr/2 

Graph C is: 

FFs(|l1s| - |Hp|,SOT, ,Fd,R lRU )/FFp<|f1p|,N,nK,SOT, ,Fd f R LPU ) 

Graph is: 

FFpUMpl.N.llx.SOT, ,Fd,R lRU )/FFp(|Hp|,*MIy,SOT, .Fd.Ro) 

Graph E is: 

FFp(t«t)I.N.nu.SOT, .Fd.Ro)/ FFB(Ws[ - f, (Ittpl.N.SS* ).SOT*, .Fd.Ro) - A 

f, (2,N,SSJ/2 

where IIx - Bl. ttyHl.T - 2500.UANH), N - 4096 Bytes 

| SOT ,1 - 2,881,027 

Note that FFs(|^ - fi t|Hpl,N.SS» ) .SOT*, .Fd.Ro) - A 

f, (2,N,SS)/2 
shown in B and E above is the lower bound of FFp given 
in Theore* 6. 

Note that FFsf|f1a| - |«p|,SOT, ,Fd,R lRU ) 

shown in C above is the upper bound of FFp given by 
Theoree 3. 

Table 3 
Paraeeters for Curves in Figure 12 
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Curve B of Figure 12 indicates that the lower bound may be loose 
for very small values of |Hp| or that the structure Ily (U, T=2580,UANU) 
does not cluster sectors very well for small values of |Hp|. The 
conjecture is that the lower bound may be loose for very small values of 
|Hp| since this phenomenon is observed in all of our experiments. This 
is not a serious practical drawback, because even for the lower bound 
the paging activity is prohibitively large for very small |f1p|. Since 
the lower bound is valid over all replacement algorithms, we compared 
the ratio of the performance of the good structure Ily using OPT 
replacement to the lower bound. This ratio is curve E of Figure 12. 

Curve C of Figure 12 illustrates the ratio of the theoretical upper 
bound given by Theorem 3 to the bad performance. The bad performance is 
the number of page fetches produced with the structure Ilx. 

The upper bound is relatively close to the "worst" performance 
resulting from the structure Ilx for most values of |Mp|. For large 
values of |rtp| the upper bound is not very tight. The upper bound will 
be tight as long as the sectors which are clustered into a page are 
never used together when that page is in Mp. However, as the size of Mp 
increases, it becomes more and more difficult for this condition to be 
satisfied. Hence, the upper bound grows very rapidly for values of |Hp| 
approaching the length of the program. However, for values of |f1p| in 
the region where the program would probably be run, the upper bound is 
reasonabl e. 
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Figure 13 shows the number of page fetches given by: 

A. the lower bound. 

B. the upper bound. 

C the bad structure, Dx. 

D. the good structure, %tti,T»258f,UANtO under LRU. 

E. the good structure, IIyttJ,T-25«8,UWW) under OPT. 

Figure 14 is srmply the values for curves A, C, and of Figure 13 shown 
at a much larger scale. 

In summary. Figures 9-14 show that the paging performance may vary 
by a factor of 12 to 30 for large regions of primary memory size |ttp|« 
This occurs when the unconstrained clustering procedures are used in 
conjunction with the sector working set and the LRU stack intersector 
reference matrices; that is, for Ily<U,T-250e,UANU) , Fly <U, T-2588, fCBI 
and Ily (U,D**15,UA*RJ). The use of clustering procedures which cluster 
sectors into natural clusters can produce program structures which 
require significantly fewer page fetches than required by program 
structures based on constrained clustering procedures. 
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Total Page Fetches vs { M p | 
FOR PHASE I OF AED COMPILER 
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14 Enlarged Scale for Curves A.C.and D of Figure FOR 
PHASE I OF AED COMPILER 
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G.3 Restructuring Phase 2 

Figure 15 shows the results of restructuring Phase 2 over sector 
trace S0T 2 , where |S0T 2 1=1,668,542. Table 4 precisely defines the 
curves of Figure 15. The bad order IIx - B 2 for Phase 2 is computed by 
the procedure BAD, which is compared to the order produced by 
riy(IJ,T=2508,UANU) . The curves of Figure 15 may be interpreted similarly 
to those of Figure 12 of Phase 1. The variation in the paging 
performance of Phase 2 as a function of program structure is not as 
large as that of Phase 1. However, the largest improvement in the 
paging performance of Phase 2 occurs when approximately one half of 
Phase 2 can fit into primary memory. 
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See Table 4 for full complete explanation of curves. 

A=* FFp (TTx)/FFp (TTy) where TTx = B2 and TTy ( W,T = 2500, ICB) 

B=» FFp (TTx)/FFp (TTy) where TTx = B 2 and TTy ( W,T = 2500, UANW) 

C=» FFp (TTx)/FFp (TTy) where lTx=B2 and TTy = Compiler Order 

D4 FFp (TTyVTheor. min FFp where TTy (W, T= 2500, UANW) 

E =» Theor. Max FFp /FFp (TTx) where TTx = B 2 

|SOT 2 |= 1,660, 54 2 






FIGURE 15 Page Fetch 
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Graph A is: 

FFp(|f1p|,N f nx,S0T 2 ,Fd,R LRU )/FFp(|l1p| ,N,IIy,S0T 2 ,Fd,R LRU ) 
rix - B2 and ny(U,T - 2588, ICB) 

Graph Bis: 

FFp(|np|,N f nx,S0T 2 ,Fd,R LRU )/FFp(|Mp|,N,ny,S0T 2 ,Fd,R LRU ) 
IIx - B2 and IlyCU.T - 2588, UANU) 

Graph C is: 

FFp(|rip|,N,nx,S0T 2 ,Fd,R LRU )/FFp( |Mp| ,N,ny,SOT 2 ,Fd,R LRU ) 
IIx « B2 and Hy - Compiler Order 

Graph D is: 

FFp(|np|,N,ny,S0T 2 ,Fd,R LRU )/FFs(|fl8| - f, (|Mp|,N,SS* ) ,SOT£ ,Fd,Ro) - A 

f, (2,N,3SJ/2 

ny(U,T - 2588, UANU) 

Graph E is: 

FF 9 (|f1s| - |np|,S0T 2 ,Fd,R LRU )/FFp(|f1p|,N,nx,S0T 2 ,Fd,R LRU ) 
IIx = B2 



Table 4 
Parameters for Curves in Figure 15 
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6.4 Restructuring Phase 3 

Phase 3 is restructured from a sector trace S0T 3 which contained 
3,859,636 references. The program structure IIx is a random ordering of 
sectors into the virtual address space. Program structures 

n y (U,T=25B0,ICB), nytU,T-2S88,UANU) and 

ny(U,0=2B,ICB) 
produced substantial improvements in the paging performance over 
rix =B 3 . These improvements are illustrated in curves A, B, and C of 
Figure 16. These curves have the highest peaks of any improvements over 
sector order ings that we found. Curve D of Figure 16 shows the ratio of 
the paging performance obtained from ITx to the performance of the 
existing compiler ordering. The theoretical lower and upper bounds are 
presented in Figure 17 in the same manner as for Phase 1 and 2. 

Now we present a few general comments about Phase 1, 2, and 3. All 
three phases indicate that significant variations in paging performance 
can occur for different arrangements of the relocatable sectors In 
virtual memory. The unconstrained clustering procedures, ICB and UANU, 
produced the best program performance over all memory sizes for all 
three phases. The constrained procedures are not shown for Phases 2 and 
3 since they produced the same relative improvement in these phases as 
in Phase 1. The theoretical lower bounds are relatively good indicators 
of the best paging performance of all three phases for all but the 
smallest primary memory sizes. 
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FIGURE 16 FFp ( |Mp|, N.TTx.SOTg.Fd, R LRU ) / FFp (|Mp| N, TTy, 

S0T 3 ,Fd, R LRU ) vs | Mp | FOR PHASE 3 OF AED 
COMPILER 








5 


10 


15 


20 


25 


30 


OK 


4 OK 


60 K 


80K 


100 K 


120 K 



FIGURE 17 FFp(|Mp|,N,Trx,SOT, ,Fd, R LRU )/FFp ( |Mpj,N,TTy, SOT,,Fd. 
RlRu) vs |Mpl FOR PHASE 3 OF AED COMPILER 
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G.5 Effects of Input Data 

In order to establish the effect that the input program to be 
compiled has on the paging performance, ue conducted the following 
exper iments: 

Experiment 1: 

A. Ue took the above sector trace S0T| and computed the prograw 
structure Ily (U, T=2588,UANU) . 

B. Ue measured a second program trace SOTj a which corresponds 

to a completely different program and restructured the compiler to 
get nya<U,T=2588,UANU) based on SOT, a. 

C. A third sector trace SOTj b was measured, and, based on this 
sector trace, the program structure riyb(U, T=2580,UANU) was 
computed. 

All three of the program structures, Ily, Ilya and Ilyb should tend to 
minimize the page fetches for the traces SOTj , SOTj a, and S0T| b 
respectively. Houever, ui I I the structures specified by Ilya or by Ilyb 
tend to minimize the page fetches for SOT)? 



198 



Figure 18 contains all the information shown in Figure 13 for Phase 
1. That is, it shows the value of the page fetch function FFp for 
SOT, and Ily as curve D, and it shows the other curves of Figure 13 for 
visual comparison. Curve F in Figure 18 represents the value of FFp as 
a function of the same reference behavior SOT| and Ilya. Curve G 
illustrates the value of FFp as a function of the same reference 
behavior SOT | and Ilyb. 

Therefore, the curves D, F, and G represent the paging performances 
of Phase 1 of the compiler for a single sector trace and three different 
partitions of sectors into clusters. The results of this experiment 
reveal that a good program structure generated from one sector trace is 
a good program structure for other sector traces. 

Experiment 2: 

Now we give another experiment. For Ily, Ilya and Ilyb from the above 
experiment, we use the BAD procedure on each II to get Ilx, Ilxa, and Ilxb 
respectively. Then, using the same sector trace SOT,, the following 
ratios are computed and plotted in Figure 13. 

A. FFp(..,rix,SOT, ,..)/FFp(..,ny,SOT, ,..) 

B. FFp(..,Ilxa,SOT, , . .)/FFp(. . .Ilya, SOT, ,..) 

C. Ffp(..,nxb,SOT, ,..)/FFp(..,nyb,SOT, ,..) 
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B=*FFs(lMsl -lMpl, SOT, , Fd,R 
C=* FFp (lMpl ,N,TTx,S0T, ,Fd, R LRU ) 
D=>FFp(lMpl,N,TTy,SOT | ,Fd l R LRU ) 
E^FFpdMpl.N.TTy.SOTpFd.Ro) 

TTx =B| 

TTy (W,T=2500, UANW) 
N =4096 Bytes 
|S0T, 1=2,001,027 
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FIGURE 18 Total Page Fetches vs |Mp| FOR PHASE I OF AED 
COMPILER 



72 
69 
66 

63 

60 

57 
54 
5 

48 h 



288 

A=»FFp(|Mp|, N,TTx,S0T,,Fd, R L RU )/ F Fp ( I Mpl , N,TTy , SOT, , Fd , 
R LRU ) where TTx and TTy ( W, T = 2500, UANW) are based onSOT| 

B=»FFp(lMpl, N,7Tx a> S0T,,Rd, R LR(J )/ FFp ( lMpl, N , TTy a , SOT,, Fd, R LRU ) 
where TTx a and TTy a ( W, T=2500, UANW) are based on SOT| a 

C=»same as B except TTx b and TTy b are based on SOT, b 




FIGURE 19 Comparison of Page Fetch Ratios for Different Program 
Structures FOR PHASE I OF AED COMPILER 
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These ratios are the improvements in paging performance over the same 
sector trace for three pair9 of program structures. Each pair consists 
of a BAD structure and a good structure. Furthermore, each pair is 
constructed from a different sector trace. However, the possible 
improvement in paging performance for each pair is nearly the same. 

Experiments 3 and 4: 

Experiments 3 and 4 for Phase 2 and 3 respectively are quite 
similar to Experiment 1 for Phase 1. The only difference is that, in 3 
and 4, the ratios of FFp(. . ,IIy,S0T 2 , . . )/FFp(. . ,Ilya,SOT 2 , . . ) and of 
FFp(. . ,Ily,SOT 2 , . . )/FFp(. . ,nyb,S0T 2 ,..) are plotted as shown in 
Figures 28 and 21 instead of the magnitude of the9e values of FFp shown 
in Figure 18. In Figure 18 it is difficult to distinguish between the 
three curves because of the scale problems. Figures 28 and 21 do away 
with the scale problems but do not show. the relationship of these values 
to the overall picture a9 is done in Figure 18. From Figures 20 and 21 
we observe that a good program structure computed from one sector trace 
turns out to be a good program structure for another sector trace. 
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CHAPTER 7 



DISCUSSION AND CONCLUSION 



7.1 Introduction 

This report has presented theoretical and experimental results 
which show that program restructuring has a significant effect on the 
paging performance of virtual memory systems. 



7.2 Summary 

The problem of restructuring programs to improve their paging 
performance in virtual memory systems Has presented in Chapter 1. 

In Chapter 2 we formalized the notion of the page fetch function 

and the sector fetch function. The page fetch function models the 

paging behavior, and the sector fetch function models the sectoring 
behavior. 

In Chapter 3 the sector fetch function was used to produce upper 
and lower theoretical bounds in the page fetch function over all 
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reorder ings of the relocatable sectors into the address space. 

Inter9ector reference models based on sector Horking sets and LRU 
3tack distances were developed in Chapter 4. In Chapter B several 
clustering methods were developed which used the intersector reference 
models to produce a restructured program. 

In Chapter G the effect of program restructuring on the paging 
performance of real programs was investigated empirically and 
theoretically. In particular, Me showed that improvements in paging 
performance of factors of 28 to 48 is not uncommon for relatively large 
regions of primary memory size. 



7.3 Further Uork 

The research reported in th'13 report provides a basis for 
additional investigation in several areas of program restructuring. 

The work described in thi3 report addresses a problem that is as 
hard as the senmly intractable problems studied by Cook IC5I and Karp 
[KB! . Recent work by several people has revealed fast algorithms for 
near optimal solutions to some of these problems. The clustering 
techniques described in Chapter 5 have been shown of value for 
particular but not trivial examples that occur in practice. It would be 
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of considerable interest to know to uhat extent these techniques can be 
relied on over all possible sector traces. Can our techniques be shown 
to yield solutions that come Hi thin a factor of two of our loner bounds? 
If not, are there algorithms that do come near our lower bounds? 
A I ternat i vely, can our lower bounds be improved? 

Ue did not investigate the problem of sector duplication in this 
thesis. Ue claim that the results of Chapter 3 can be applied in a 
straightforward manner to produce lower bounds on the paging performance 
when sector duplication is allowed. Another related problem is how to 
incorporate sector duplication into the intersector reference models and 
into the clustering procedures. 

Another area is the problem of deciding when it is best for sectors 
to cross page boundaries and when it is best to have hole9 in pages. 

An ongoing research project between the author and Don Hatfield of 
IBM is to use the theoretical results of Chapter 3 to evaluate the 
potential benefit of reprogramming and then restructuring a very large 
data base system. This large data base system has sectors which are 
over 10 pages long. For example. Theorem 1 can be used to predict the 
theoretical best paging performance if the large data base system is 
broken up into k sectors per page. Thus, the problem is to determine 
the k that provides the best theoretical improvement and then use the 
magnitude of this improvement as a basis for deciding whether or not 
reprogramming 19 advisable. 
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